Docker tutorial: Custom Cloud Environments for Jupyter Notebooks

Anton Kovalsky

This is a step-by-step guide for 1) building and publishing a custom Docker image by modifying one of Terra's base images to include additional packages and 2) running a Jupyter Notebook on Terra using that custom image.

Creating custom Dockers for Notebooks is not easy!Because of the size and complexity of the current Terra base image, this is an advanced option that only power users should attempt. For Terra base images and step-by-step instructions on GitHub, see https://github.com/DataBiosphere/terra-docker#terra-base-images. Improving this functionality is currently on the Terra Roadmap. See Notebooks users can easily build, customize, and reuse their Docker containers

Pre-requisites

Before you get started, make sure that your computer is set up to follow this tutorial:

1. Register for GitHub

2. Register for Dockerhub

3. Install Docker Desktop

4. Install a text editor (e.g., Sublime)

Step 1. Clone the Git repository with the base images

First, you need to download all our base images by cloning the Terra GitHub repository. You can grab them all at once.

1.1. Select the green Code button on the GitHub repo page and copy the URL.

Screenshot of GitHub repo page with green code button in upper right and URL https://gitbub.com/DataBiosphere/terra in https tab with copy icon at right

1.2. Open a local terminal and execute the command Git clone LINK using the link from the previous step.

Upon executing this command, you’ll see something like this.

Screenshot of terminal with output of Git clone command including cloning into 'terra docker'... Remote: enumerating objects: 42. Done. Remote: Counting objects: 100% (42/42)

Now, you should have our entire collection of Docker base images on your local machine in a new directory called terra-docker. Inside this directory, you should see a folder terra-jupyter-r. This is the image we will modify in this tutorial.

Step 2. Modify a Docker file to meet your needs

The next step is to modify one of the base Docker files to include an additional package (edgeR).

2.1. Find the folder terra-jupyter-r (by typing in “terra-jupyter-r” in your Finder search bar) and open the Docker file (conveniently called Dockerfile) in your favorite text editor.

ScreenShot of finder window showing Dockerfile in the terra-jupyter-r directory with the option to open with sublime selected.

If you scroll through this file, at the bottom, you should see a list of R packages, mostly installed with BiocManager. This is where you will add a new package to create your custom Docker image.

2.2. Under the line containing R -e 'BiocManager::install(c( \ add a new line, "edgeR". This will add the edgeR package - a popular BioConductor package for the analysis of digital gene expression data - to your Docker image.

Use spaces, not tabs, in Dockerfiles It's most likely that the Dockerfile you're modifying will use 4 spaces rather than tabs to indent lines. Using a combination of tabs and spaces to indent lines in a Dockerfile can cause errors. To avoid this, indent any new lines you add to the Dockerfile using 4 spaces.

screen capture of finding the line containing the code R -e 'BiocManager::install(c( using the search function in the Edit menu, adding a new line with the code edgeR, and saving the Dockerfile by selecting save from the File menu.

2.3. Once you add this to the code, just click Save! No need to Save as - you shouldn't rename the file in any way.

Step 3. Remove half-finished Docker builds on your machine

Before you build!You are almost ready to build and push your custom Docker image! Before you execute the build command, you may need to remove any half-finished Docker builds from your machine and set up your DockerHub or Google Container Registry (GCR).

We'll walk you through these steps, but if you have some Docker experience, you might not need to worry about these and can skip to Step 5. Build your custom Docker image (assuming you already have a Docker repository with a name and tag matching the image you are about to build).

If you never used Docker images on the machine you’re using for this exercise, you probably don’t need to do this part. But if you played around with Docker, you may need to follow the pruning steps below. If you skip these steps and have trouble down the road, come back to see if this helps when troubleshooting.

3.1. Open Docker Desktop and log into your Docker account.

3.2. In your local terminal, execute the following command to see if there are any other images on your machine. Conveniently, this command can be executed while in any directory.

docker image ls

If you come up with an empty list, skip to Step 4. Set a destination for your Docker image.

3.3. If your list ISN'T empty (and you don’t need the images listed), execute the following command.

docker system prune -a

3.4. Execute the Docker image ls command again to check that the pruning worked. Now the list should be empty.

Step 4. Set a destination for your Docker image

You must set up a destination in the Cloud for your Docker image, so there is a place to push it to.

Where to store your image Terra accepts Docker images stored in the following registries
  - Google Cloud Container Registry (GCR)
  - GitHub Container Registry (GHCR)
  - DockerHub

The advantage of using GCR is the ability to use private buckets. DockerHub users are limited to public repositories, while GCR buckets can give Terra convenient access to private resources.

At this time, Quay is not a supported registry for custom cloud environments. You can, however, use Quay images for workflow submissions.

Follow the instructions below for setting up the destination for your image using either DockerHub or Google Container Registry (GCR).

4.1. Log into DockerHub and click Create Repository.

Screencapture of steps 4.2 through 4.4 in DockerHub.

4.2.
Give your repository a Name and make sure the Visibility is set to Public so Terra can access your Docker image.

4.3. Create your repository by clicking on the blue Create Repository button.

Step 5. Build and push your custom Docker image

Follow the instructions below to build and push your custom Docker image.

5.1. If you haven't already, open Docker Desktop and log into your Docker account.

5.2. Change directory into your terra-jupyter-r directory using the following command.

cd terra-jupyter-r

The build command must be executed from within the directory with the modified Docker file.

How to adapt these instructions for your own Docker structure If you're following this tutorial exactly, the terra-jupyter-r directory should have everything you need. However, if you're trying to use these instructions for your own Docker adventures, you may want to use the ls (for Macs) or dir (for PCs) command to list the contents of the directory to make sure the necessary dockerfile is present.

Note that dockerfiles should not have any file extension - they should be named Dockerfile, not Dockerfile.txt or Dockerfile.py. If you just made your own dockefile from scratch and you're having trouble getting rid of an extension, you can get rid of it by renaming the file with this command: mv Dockerfile.txt Dockerfile.

5.3. Execute the build command below.

Docker build -t DOCKER_ACCOUNT_NAME/REPOSITORY_NAME:OPTIONAL_TAG_NAME .

The building process should take about 10 minutes.

5.4. Execute the push command to upload your custom image to your repo.

Docker push DOCKER_ACCOUNT_NAME/REPOSITORY_NAME:OPTIONAL_TAG_NAME

This step may also take up to 10 minutes.

Troubleshooting tips1. Make sure the account, repository, and (optional) tag names all match the names you used in DockerHub.

2. The Docker package builds the image based on the Docker file in the present directory, so don’t forget the period (“.”) at the end of the build command!

3. You MUST run your command from the directory containing the dockerfile. Docker only recognizes dockerfiles named simply Dockerfile (no extensions), so you can have as many dockerfiles as you want on your computer, but they need to be in separate folders, with only one dockerfile per folder. When you execute the Docker build command, it will look for a dockerfile in the directory you're looking at in your terminal. There must be a single file simply named Dockerfile in that directory, or the command will fail.

  • Sometimes you need to know a Docker container's digest - a unique content-addressable identifier - to be certain that all nodes are running the correct version of the container.

    There are two ways to get the digest depending on where your image is stored. In both cases, you'll look for something with the format sha256:SOMETHING_LONG, where the SOMETHING_LONG bit is the digest.

    Follow the instructions below, depending on whether your image is stored on your local machine or not.

    • In the terminal, type docker inspect at the prompt. Note: The output is more complicated (there are two things that look like sha256:SOMETHING_LONG. The one you want is the "RepoDigests" one, not the "Id"):
      ~ $ docker inspect MY_REPO/MY_IMAGE:TAG
      [
          {
              "Id": "sha256:a98acb9802cbf46eb71e28c652f58026c027d9580ff390c6fa9ae4dec07ae13d",
              "RepoTags": [
                  "MY_REPO/MY_IMAGE:TAG"
              ],
              "RepoDigests": [
                  "MY_REPO/MY_IMAGE@sha256:96bf2261d3ac54c30f38935d46f541b16af7af6ee3232806a2910cf19f9611ce"
              ],
      
      ...and a lot of other details we don't care about right now.
    • In the terminal, type docker pull MY_REPO/MY_IMAGE:TAG at the prompt. The digest will be displayed in the output as:
      Digest: sha256:96bf2261d3ac54c30f38935d46f541b16af7af6ee3232806a2910cf19f9611ce

Step 6. Launch a Notebook with your custom Docker image

You should now be ready to launch a Notebook Cloud Environment based on your custom Docker image!

1. Navigate to the workspace Analyses page with the notebook you want to run using the custom Docker image.

2. Select the Environment Configuration button (cloud icon) in the side panel on the right side of the screen.

Screenshot of workspace Analyses page with cloud icon highlighted in right sidebar.

3. Select Environment Settings under the Jupyter section of the panel that opens up.

Screenshot of Cloud Environment Settings configuration pane with Jupyter gear icon at the top left circled.

4. If you already have a Jupyter Cloud Environment in the workspace, select the Custom Environment option at the bottom of the Application Configuration dropdown.

If you are creating a new Cloud Environment, select the option to Customize the Cloud Environment, and select the Custom Environment option at the bottom of the Application Configuration Dropdown.

Screenshot of Application Configuration dropdown with the custom environments under other environments circled

5. Fill in the required field with the name and location of the image in your repository.

Screenshot of the custom environment application configuration with repository_name/docker_image_name.tag1 in the container image field.

6. Select Create/Replace at the bottom right of the form to create or update the Environment, depending on whether one already existed for your workspace. It will take about 10 minutes for the new virtual machine (VM) to spin up.

7. Open any notebook (or create a new one) in the same workspace.

8. Test to see if the new packages have been installed on your virtual machine.

Screenshot of notebook code cell with command library(edgeR) successfully completed and output: loading required package: 1imma

9. Don't forget to save the image identifier and URL right in your notebook to keep track of which image the notebook is intended to use.

Next steps: Add a custom Docker to your WDL

In your WDL, you should include MY_REPO/MY_IMAGE@sha256:SOMETHING_LONG. Note: The tag isn't there at all; it's been replaced by the digest, which is a more specific identifier.

Was this article helpful?

1 out of 1 found this helpful

Comments

7 comments

  • Comment author
    jamesp

    Can we use a gcr.io repository instead of DockerHub?

    0
  • Comment author
    Anton Kovalsky

    Hi James, thanks for your questions! You can use gcr.io, the custom images field accepts images from both Dockerhub and GCR.

    0
  • Comment author
    Denis Loginov

    And does it have to based on the Terra notebook image, or could it be another image (e.g. RStudio) that listens on port 8080?

    0
  • Comment author
    Anton Kovalsky

    Hi Denis Loginov,

    1
  • Comment author
    Merve Dede

    Hello, I am trying to modify the terra-jupyter-base environment in order to run python 3.8 instead of 3.7. I can't see where in the Dockerfile the python version is specified. Do you have any advice? Thanks

    0
  • Comment author
    Denis Loginov

    @Merve Dede I'd guess it's installed in the base image gcr.io/deeplearning-platform-release/tf-gpu.2-7, which is probably using old versions of everything. This image is provided by Google and has been deprecated. You might have better luck with a newer one, like gcr.io/deeplearning-platform-release/tf-gpu.2-10 listed here: https://cloud.google.com/deep-learning-containers/docs/choosing-container (but there might be some incompatibilities to resolve with other packages installed in that Dockerfile..)

    0
  • Comment author
    Eugene Duff

    Hi - I'm getting 10min time-outs when I try to start up my (fairly extensive) custom jupyter-R docker - is there any way around this or way to debug things? I'm currently trying to incrementally add elements to the original Dockerfile, but have the feeling it is timing out simply due to the additional packages slowing things..

    Thanks

    0

Please sign in to leave a comment.