Docker tutorial: Custom cloud environments for Jupyter notebooks

Anton Kovalsky

This is a step-by-step guide for 1) building and publishing a custom Docker image and 2) running a Jupyter Notebook on Terra using a Docker image modified to include additional packages.

1. Clone the Git repository with the base images

First you should download all of our base images by cloning the Terra GitHub repository (you can grab them all at once!).

1. Start by finding the green “Clone or download” button in the GitHub repo page and copying the URL

Screen_Shot_2019-12-03_at_11.45.39_PM.png

2. Open your terminal and put in the command “Git clone [link]” using the link you found in the previous step.

Upon executing this command, you’ll see something like this:

Screen_Shot_2019-12-03_at_11.37.26_PM.png

You should now have our entire collection of Docker base images on your local machine in a new directory called “terra-docker”. Inside this directory you should see a folder terra-jupyter-r. This is the image we will modify in this tutorial.

2. Modify a Docker file to make one that meets your needs  

The next step is to modify one of the base Docker files and “build an instance” of your desired Docker image (this involves just one command, but can take your computer some time to accomplish).

Find the folder terra-jupyter-r (by typing in “terra-jupyter-r” in your Finder search bar, for example) and open the Docker file (conveniently called “Dockerfile”) in your favorite text editor:

Screen_Shot_2019-12-04_at_12.15.04_AM.png

If you scroll through this file, at the bottom you should see a list of R packages, mostly installed with BiocManager. This is where you will add our package.

  1. Under the line containing && R -e 'BiocManager::install(c(  add the package “edgeR” – a popular BioConductor package for analysis of digital gene expression data.

  2. Once you’ve added this to the code, just click “save”! No need to “save as” and you should not rename the file in any way.

G20_Dec5_2019.gif

3. Build and push your Docker image

G0_warning-icon.png


Before you build!

 

You are almost ready to build and push your custom Docker image! Before you execute the build command, you may need to

1. Make sure there are no half-finished Docker builds on your machine that will mess up the building and pushing process. and

2. Set up your Docker hub or Google container registry so there is a place to push your custom image.

If you have some Docker experience, you may not need to worry about these, and can skip to step 3. Build your image (assuming you already have a Docker repository with a name and tag matching the image you are about to build).

3.1. Remove all half-finished Docker builds on your machine

If you’ve never used Docker images on the machine you’re using for this exercise, you probably don’t need to do this part. But if you've been playing around with docker, you may need to follow the pruning steps below. And if you skip it and have trouble down the road, come back to see if this helps when troubleshooting.

  1. In your terminal, type the command “Docker image ls” to see if there are any other images on your machine. Conveniently, this command can be executed while in any directory. If you come up with an empty list, skip to step b.

  2. If your list is not empty (and you don’t need the images listed), execute the following command:
    docker system prune -a;
  3. Use the “Docker image ls” command again to check that the pruning worked

3.2. Set a destination for your Docker image

You must now set up a destination for your Docker image.

G0_warning-icon.png


Where to store your image

 

Terra accepts Docker images stored in the following registries:

  • Google Cloud Container Registry (GCR)
  • GitHub Container Registry (GHCR)
  • DockerHub

At this time, Quay is not a supported registry for custom cloud environments. You can, however, use Quay images for workflow submissions.

Note that it's important to put in the same image name (and tag) you intend to use in your build command.

Expand for step-by-step instructions (DockerHub users)

Before you start, make sure to sign up for docker and install locally following these instructions
  1. Go to your DockerHub account and create new repository.

  2. Make sure Terra can access your Docker image by making the repository public. createrepo.gif

Expand for step-by-step instructions (GCR users)

  1. Create a bucket in the Google container registry. The advantage of using GCR is the ability to use private buckets. Docker hub users are limited to only using public repositories, while GCR buckets have a convenient way to give Terra access to private resources.

  2. Allow Terra to have access to a private GCR bucket by adding your individual personal Terra group as a member of the bucket 
    See this article! for how and why to make a personal Terra group to access external resources. You can also add your @firecloud.org proxy email address (find in your profile on Terra) as a member to that bucket.

s28a.png

Give group access to a private Docker container
Alternatively, if you want a group of collaborators to have access to your private Docker container, you can add the @firecloud.org email address for that group (found in the Groups section of your Terra profile).

3.3. Build and push your custom image

This is the crucial step! The build command must be executed from within the directory with the modified docker file.

G0_warning-icon.png


Before you start! Check for these common mistakes

 
  • Make sure the repository name and image name match what you’ve set up in your Docker hub.

  • The Docker package builds the image based on the docker file in the present directory, so don’t forget the period (“.”) at the end of the build command!

The most important thing to understand about this step is you MUST run your command from the directory containing the dockerfile. Docker only recognizes dockerfiles named simply "Dockerfile" (no extensions), so you can have as many dockerfiles as you want on your computer, but they need to be in separate folders, with only one dockerfile per folder. When you execute the "Docker build" command, it will look for a dockerfile in the directory that you're looking at in your terminal. There needs to be a single file simply named "Dockerfile" in that directory, or the command will fail.

  1. First cd into your “terra-jupyter-r” directory
    cd terra-jupyter-r
    If you're following this tutorial exactly, then the contents of the folders you cloned from git should be right. If you're trying to use these instructions for your own docker adventures, you may want to use the "ls" command to list the contents of the directory to make sure the necessary dockerfile is present. If you just made your own dockefile from scratch and you're having trouble getting rid of an extension (such as ".txt"), you can get rid of it by renaming the file with the command line:
    mv Dockerfile.txt Dockerfile
  2. Execute the build command (the building process should take about 10 minutes)
    Docker build -t Repository_name/docker_image_name:tag1 .
  3. Execute the push command to upload your custom image to your repo (may also take up to 10 minutes) 
    Docker push Repository_name/docker_image_name:tag1

How to find your Docker container's digest

There may be times when you need to know a docker container's digest - a unique content-addressable identifier - in order to be certain that all nodes are running the correct version of the container. For example, if you want to know the digest for my_repo/my_image:tag.

There are two ways to get the digest. In both cases you'll look for something with the format sha256:something_long, where the something_long bit is the digest.

Expand for instructions (image NOT stored on your computer)

In the terminal, type docker pull my_repo/my_image:tag at the prompt. The digest will be displayed in the output as:
Digest: sha256:96bf2261d3ac54c30f38935d46f541b16af7af6ee3232806a2910cf19f9611ce

Expand for instructions (image is stored on your computer)

In the terminal, type docker inspect at the prompt. Note that the output is more complicated (there are two things that look like sha256:something_long. The one you want is the "RepoDigests" one, not the "Id"):
~ $ docker inspect my_repo/my_image:tag
[
    {
        "Id": "sha256:a98acb9802cbf46eb71e28c652f58026c027d9580ff390c6fa9ae4dec07ae13d",
        "RepoTags": [
            "my_repo/my_image:tag"
        ],
        "RepoDigests": [
            "my_repo/my_image@sha256:96bf2261d3ac54c30f38935d46f541b16af7af6ee3232806a2910cf19f9611ce"
        ],

...and a lot of other details we don't care about right now.

Launching a Notebook with your custom Docker image

You should now be ready to launch a Notebook Cloud Environment based on your custom Docker image! First, go into a workspace with the notebook you want to run the custom Docker image and click on the “Cloud Environment” button in the upper right corner of the screen.

  1. Select the “Custom Environment” option at the very bottom of the Application Configuration dropdown.

  2. Fill the required field with the location of the image in your repository
    Screen_Shot_2019-12-04_at_2.10.22_AM.png
  3. Click “Create”/”Replace” (bottom right of the form). You will need to wait another 10 minutes or so while the new virtual machine spins up.
  4. Open any notebook (or create a new one) in the same workspace
  5. Test to see if the new packages have installed on your virtual machine:
    Screen_Shot_2019-12-04_at_2.19.21_AM.png
  6. Don't forget to save the image identifier and URL right in your notebook in order to keep track of which image the notebook is intended to use!

Adding a custom Docker to your WDL

Then in your WDL, you should include my_repo/my_image@sha256:something_long. Note that the tag isn't there at all, as it's been replaced by the digest, which is a more specific identifier.

 

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

4 comments

  • Comment author
    jamesp

    Can we use a gcr.io repository instead of DockerHub?

    0
  • Comment author
    Anton Kovalsky

    Hi James, thanks for your questions! You can use gcr.io, the custom images field accepts images from both Dockerhub and GCR.

    0
  • Comment author
    Denis Loginov

    And does it have to based on the Terra notebook image, or could it be another image (e.g. RStudio) that listens on port 8080?

    0
  • Comment author
    Anton Kovalsky

    Hi Denis Loginov,

    If you want to use RStudio, you should look into extending this base instead: https://github.com/anvilproject/anvil-docker/tree/master/anvil-rstudio-base 
     
    The ports we use are 8000 for Jupyter and 8001 for RStudio. It's possible that it would work to launch an arbitrary image that listens on one of those ports, however we can't guarantee it, since we have other configurations besides opening the ports. 
    1

Please sign in to leave a comment.