This is a step-by-step guide for 1) building and publishing a custom Docker image by modifying one of Terra's base images to include additional packages and 2) running a Jupyter Notebook on Terra using that custom image.
Creating custom Dockers for Notebooks is not easy!Because of the size and complexity of the current Terra base image, this is an advanced option that only power users should attempt. For Terra base images and step-by-step instructions on GitHub, see https://github.com/DataBiosphere/terra-docker#terra-base-images. Improving this functionality is currently on the Terra Roadmap. See Notebooks users can easily build, customize, and reuse their Docker containers.
Pre-requisites
Before you get started, make sure that your computer is set up to follow this tutorial:
1. Register for GitHub
2. Register for Dockerhub
3. Install Docker Desktop
4. Install a text editor (e.g., Sublime)
Step 1. Clone the Git repository with the base images
First, you need to download all our base images by cloning the Terra GitHub repository. You can grab them all at once.
1.1. Select the green Code button on the GitHub repo page and copy the URL.
1.2. Open a local terminal and execute the command Git clone LINK
using the link from the previous step.
Upon executing this command, you’ll see something like this.
Now, you should have our entire collection of Docker base images on your local machine in a new directory called terra-docker
. Inside this directory, you should see a folder terra-jupyter-r
. This is the image we will modify in this tutorial.
Step 2. Modify a Docker file to meet your needs
The next step is to modify one of the base Docker files to include an additional package (edgeR).
2.1. Find the folder terra-jupyter-r
(by typing in “terra-jupyter-r” in your Finder search bar) and open the Docker file (conveniently called Dockerfile
) in your favorite text editor.
If you scroll through this file, at the bottom, you should see a list of R packages, mostly installed with BiocManager. This is where you will add a new package to create your custom Docker image.
2.2. Under the line containing R -e 'BiocManager::install(c( \
add a new line, "edgeR"
. This will add the edgeR package - a popular BioConductor package for the analysis of digital gene expression data - to your Docker image.
Use spaces, not tabs, in Dockerfiles It's most likely that the Dockerfile you're modifying will use 4 spaces rather than tabs to indent lines. Using a combination of tabs and spaces to indent lines in a Dockerfile can cause errors. To avoid this, indent any new lines you add to the Dockerfile using 4 spaces.
2.3. Once you add this to the code, just click Save! No need to Save as - you shouldn't rename the file in any way.
Step 3. Remove half-finished Docker builds on your machine
Before you build!You are almost ready to build and push your custom Docker image! Before you execute the build command, you may need to remove any half-finished Docker builds from your machine and set up your DockerHub or Google Container Registry (GCR).
We'll walk you through these steps, but if you have some Docker experience, you might not need to worry about these and can skip to Step 5. Build your custom Docker image (assuming you already have a Docker repository with a name and tag matching the image you are about to build).
If you never used Docker images on the machine you’re using for this exercise, you probably don’t need to do this part. But if you played around with Docker, you may need to follow the pruning steps below. If you skip these steps and have trouble down the road, come back to see if this helps when troubleshooting.
3.1. Open Docker Desktop and log into your Docker account.
3.2. In your local terminal, execute the following command to see if there are any other images on your machine. Conveniently, this command can be executed while in any directory.
docker image ls
If you come up with an empty list, skip to Step 4. Set a destination for your Docker image.
3.3. If your list ISN'T empty (and you don’t need the images listed), execute the following command.
docker system prune -a
3.4. Execute the Docker image ls
command again to check that the pruning worked. Now the list should be empty.
Step 4. Set a destination for your Docker image
You must set up a destination in the Cloud for your Docker image, so there is a place to push it to.
Where to store your image Terra accepts Docker images stored in the following registries
- Google Cloud Container Registry (GCR)
- GitHub Container Registry (GHCR)
- DockerHub
The advantage of using GCR is the ability to use private buckets. DockerHub users are limited to public repositories, while GCR buckets can give Terra convenient access to private resources.
At this time, Quay is not a supported registry for custom cloud environments. You can, however, use Quay images for workflow submissions.
Follow the instructions below for setting up the destination for your image using either DockerHub or Google Container Registry (GCR).
4.1. Log into DockerHub and click Create Repository.
4.2. Give your repository a Name and make sure the Visibility is set to Public so Terra can access your Docker image.
4.3. Create your repository by clicking on the blue Create Repository button.
Step 5. Build and push your custom Docker image
Follow the instructions below to build and push your custom Docker image.
5.1. If you haven't already, open Docker Desktop and log into your Docker account.
5.2. Change directory into your terra-jupyter-r
directory using the following command.
cd terra-jupyter-r
The build command must be executed from within the directory with the modified Docker file.
How to adapt these instructions for your own Docker structure If you're following this tutorial exactly, the terra-jupyter-r directory should have everything you need. However, if you're trying to use these instructions for your own Docker adventures, you may want to use the ls
(for Macs) or dir
(for PCs) command to list the contents of the directory to make sure the necessary dockerfile is present.
Note that dockerfiles should not have any file extension - they should be named Dockerfile
, not Dockerfile.txt
or Dockerfile.py
. If you just made your own dockefile from scratch and you're having trouble getting rid of an extension, you can get rid of it by renaming the file with this command: mv Dockerfile.txt Dockerfile
.
5.3. Execute the build command below.
Docker build -t DOCKER_ACCOUNT_NAME/REPOSITORY_NAME:OPTIONAL_TAG_NAME .
The building process should take about 10 minutes.
5.4. Execute the push command to upload your custom image to your repo.
Docker push DOCKER_ACCOUNT_NAME/REPOSITORY_NAME:OPTIONAL_TAG_NAME
This step may also take up to 10 minutes.
Troubleshooting tips1. Make sure the account, repository, and (optional) tag names all match the names you used in DockerHub.
2. The Docker package builds the image based on the Docker file in the present directory, so don’t forget the period (“.”) at the end of the build command!
3. You MUST run your command from the directory containing the dockerfile. Docker only recognizes dockerfiles named simply Dockerfile
(no extensions), so you can have as many dockerfiles as you want on your computer, but they need to be in separate folders, with only one dockerfile per folder. When you execute the Docker build
command, it will look for a dockerfile in the directory you're looking at in your terminal. There must be a single file simply named Dockerfile
in that directory, or the command will fail.
-
Sometimes you need to know a Docker container's digest - a unique content-addressable identifier - to be certain that all nodes are running the correct version of the container.
There are two ways to get the digest depending on where your image is stored. In both cases, you'll look for something with the format
sha256:SOMETHING_LONG
, where theSOMETHING_LONG
bit is the digest.Follow the instructions below, depending on whether your image is stored on your local machine or not.
- In the terminal, type
docker inspect
at the prompt. Note: The output is more complicated (there are two things that look likesha256:SOMETHING_LONG
. The one you want is the "RepoDigests" one, not the "Id"):~ $ docker inspect MY_REPO/MY_IMAGE:TAG [ { "Id": "sha256:a98acb9802cbf46eb71e28c652f58026c027d9580ff390c6fa9ae4dec07ae13d", "RepoTags": [ "MY_REPO/MY_IMAGE:TAG" ], "RepoDigests": [ "MY_REPO/MY_IMAGE@sha256:96bf2261d3ac54c30f38935d46f541b16af7af6ee3232806a2910cf19f9611ce" ], ...and a lot of other details we don't care about right now.
- In the terminal, type
docker pull MY_REPO/MY_IMAGE:TAG
at the prompt. The digest will be displayed in the output as:Digest: sha256:96bf2261d3ac54c30f38935d46f541b16af7af6ee3232806a2910cf19f9611ce
- In the terminal, type
Step 6. Launch a Notebook with your custom Docker image
You should now be ready to launch a Notebook Cloud Environment based on your custom Docker image!
1. Navigate to the workspace Analyses page with the notebook you want to run using the custom Docker image.
2. Select the Environment Configuration button (cloud icon) in the side panel on the right side of the screen.
3. Select Environment Settings under the Jupyter section of the panel that opens up.
4. If you already have a Jupyter Cloud Environment in the workspace, select the Custom Environment option at the bottom of the Application Configuration dropdown.
If you are creating a new Cloud Environment, select the option to Customize the Cloud Environment, and select the Custom Environment option at the bottom of the Application Configuration Dropdown.
5. Fill in the required field with the name and location of the image in your repository.
6. Select Create/Replace at the bottom right of the form to create or update the Environment, depending on whether one already existed for your workspace. It will take about 10 minutes for the new virtual machine (VM) to spin up.
7. Open any notebook (or create a new one) in the same workspace.
8. Test to see if the new packages have been installed on your virtual machine.
9. Don't forget to save the image identifier and URL right in your notebook to keep track of which image the notebook is intended to use.
Next steps: Add a custom Docker to your WDL
In your WDL, you should include MY_REPO/MY_IMAGE@sha256:SOMETHING_LONG
. Note: The tag isn't there at all; it's been replaced by the digest, which is a more specific identifier.