Make a Docker (container) image the easy way: using a base image

This step-by-step tutorial walks through building, testing, and sharing a Docker starting with a base image. A Docker is a "container" - a software package including code and all its dependencies - designed to run applications quickly and reliably from one computing environment to another. By eliminating problems that arise when different people use different hardware and software, Docker containers help make analyses easier to reproduce and share with collaborators.

Creating custom Dockers for Notebooks is not easy!Because of the size and complexity of the current Terra base image, this is an advanced option that only power users should attempt. For Terra base images and step-by-step instructions on GitHub, see https://github.com/DataBiosphere/terra-docker#terra-base-images. Improving this functionality is currently on the Terra Roadmap. See Notebooks users can easily build, customize, and reuse their Docker containers.

Basic Docker concepts

Docker: Often called a "container," a Docker is like a new laptop with exactly what you need preinstalled - no more and no less. When you run an analysis within that container, it will only use the software, packages, and environment variables that are included in the container -- so it's important to include everything you need to run your analysis.
Docker image: When you make a Docker "image", you're setting up a computer system in a virtual "container" that you can copy and run on any other machine - without worrying about what kind of system it's running. Running the image builds the Docker.

Minimum Docker requirements

An operating system
Accessory software like Java, Python and/or R, with dependent libraries installed (usually)

How do I build a Docker without knowing a lot about system configuration?

When you get a new laptop, it already has an operating system and software preinstalled. You can add more by copying programs (i.e., java JAR files), or installing them (if they have to be compiled in place - i.e., Samtools). Similarly, when you build a Docker you can start with a base image that has some of what you want and add anything else that you need.

There are all sorts of base images available, designed for different purposes with more - or less - software already bundled in. If you don't find exactly what you need in an existing base image, you can modify any Docker to include the software, libraries, and packages that you need to run your analysis.

No really, how do I build it? How do I install things on a machine that doesn't exist?

That's where the Dockerfile comes in. The Dockerfile is a recipe for building the Docker that outlines every step that your Docker program follows to build and run your Docker container.

Enough talk -- let's make a Docker!

Step 1. Set up preliminaries (one time only)

For this exercise, let's make a Docker that has Java 8, Picard tools and R with the ggplot2 library installed.

1.1. Install Docker Desktop - You need to have the Docker Desktop program itself installed on your local machine (laptop, etc.); see Install Docker and test that it works for guidance.

1.2. Download the Picard toolkit - We'll use this java command-line program as an example of a piece of software you might want to "Dockerize", i.e., for which you might want to build a container image.

Locate the picard.jar file in the Assets section of the latest release of the Picard toolkit and download it to your computer.

1.3. Install a text editor program - For example, Sublime.

Step 2. Make a Dockerfile

2.1. On your computer, create a directory for this tutorial project.

2.2. Copy the picard.jar file (see prerequisites above) to this directory.

2.3. Open a terminal program. These typically come pre-installed on your computer -- look for a program called Terminal on a Mac or cmd or Powershell on a PC.

2.4. In the terminal, navigate to the directory that you just made for this tutorial.

- If you're working on a Mac, use ls to list the contents of your current directory and cd <directory name> to move into a different directory.

- If you're working on a PC, use dir to list the contents of your current directory and cd <directory name> to move into a different directory.

2.5. Open a text editor program (such as Sublime) and create a new file called Dockerfile. Save it in the directory that you created for this tutorial, without any file extension.

2.6. Copy and paste the text below into your Dockerfile:

# Specify the base image -- here we're using one that bundles the OpenJDK version of Java 8 on top of a generic Debian Linux OS
FROM openjdk:8-jdk-slim

#Set the working directory to be used when the docker gets run
WORKDIR /usr

# Do a few updates of the base system and install R (via the r-base package)
RUN apt-get update && \
        apt-get upgrade -y && \
        apt-get install -y r-base

# Install the ggplot2 library and a few other dependencies we want to have available
RUN echo "r <- getOption('repos'); r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile
RUN Rscript -e "install.packages('reshape')"
RUN Rscript -e "install.packages('gplots')"
RUN Rscript -e "install.packages('ggplot2')"

# Add the Picard jar file (assumes the jar file is in the same directory as the Dockerfile, but you could provide a path to another location)
COPY picard.jar /usr/picard.jar

Step 3. Build the Docker

3.1. In the terminal, run the following command to log into Docker: docker login

3.2. If prompted, enter your Docker login credentials.

3.3. In the terminal, run the following command after filling in your username and the name of a repository, to build your Docker image according to the instructions in your Dockerfile: docker build -t <username>/<repo>:<tag> .

For example: docker build -t ltarhan/tutorial-docker:v1 .

Command parameters

<username> is your Docker Hub username.
<repo> is the name of the repository where you will store the Docker (which can be an existing repo or the name of a repo that is created when you push the Docker to Docker Hub).
<tag> is an optional keyword or version number that helps identify a specific image.

Don't forget the `.` at the end of the command This tells Docker to build in the working directory.

Be sure to run the Docker commands from the right directory You must run the commands to build the Docker from within the directory that contains your Dockerfile and the picard.jar file.

What to expect

This will run for a few minutes and output a lot of logs to the terminal. Most of the output is what you'd see if you were to install a Linux operating system on your machine, followed by R, and the libraries specified in the Dockerfile. Eventually, the mad scrolling gobbledygook will stop and you should see something like "Successfully built 084e949b60cb".

Step 4. Test that it works

Technically, you achieved your goal of building a Docker image -- but let's test that it works before celebrating, by running this command: docker run -it <username>/<repo>:<tag>

For example: docker run -it ltarhan/tutorial-docker:v1

Don't forget the `-it` flag!! The it makes it in interactive mode; if you omit it, nothing will happen because your Docker isn't set to do anything by default.

What to expect

If everything works, your terminal command prompt will change to something like root@ca9af9b92f3d:/usr# (but with a different number after the @). At this point, you're at the helm of your shiny new virtual laptop! You can see the contents of your Docker by running the ls or dir commands and navigate within it with cd <directory name>.

How to verify your Docker

To test that you can use Picard tools in your Docker, run this command inside your Docker session:

java -jar picard.jar

This should output the list of tools available in that release of Picard. When you're done, shut it down by running exit. For more detailed instructions on how to use tools inside a Docker container (including mounting a volume to access the filesystem), see Run GATK in a Docker container.

Step 5. Share your Docker image

You've now built a Docker image, but it only exists on your local computer - nobody else can use it to run their analysis. To allow others to use your image to run their own Docker containers, you must push your image to Docker Hub, using the following command:

docker push <username>/<repo>:<tag>

What to expect

Again, this may take a few minutes as the size of the Docker is around 400 Mb. If you make more Dockers with some of the same components, those components will copy over from what you've already pushed. So updating Dockers is usually faster than the initial process.

See this tutorial for instructions to push the image to the Google Container Repository (GCR) instead.

Tips Docker Development

Start with an official base image - look for base images to adapt for your customized Docker among Docker Hub's official images. These are curated images that include popular tools like Python, SQL, and Ubuntu. They're also more secure than unofficial images.
Docker responsibly - remember that when you put software on a Docker and publish it, you're responsible for checking that you comply with the licenses of everything included in your image! To learn more about creating safe and secure Docker containers, see Creating safe and secure Docker images.
Read the Docker documentation - There are many options to refine your Docker's setup (including adding labels, environment variables, making it run commands when it boots up), which you can read about in the Docker documentation.
You can see the Docker made for this tutorial, vdauwera/tutorial_example:picard-2.9.0, at https://hub.docker.com/r/vdauwera/tutorial_example/.

Using your Docker on Terra

Now that you've built a Docker, you can use it to run reproducible analyses on the Cloud in Terra.

To learn how to run an interactive notebook within a custom Docker, see Custom cloud environments for Jupyter notebooks.
To learn how to run a workflow within a custom Docker, see Chapter 5 of our free online course, Writing WDL Workflows in Terra.

Make a Docker (container) image the easy way: using a base image

Basic Docker concepts

Minimum Docker requirements

How do I build a Docker without knowing a lot about system configuration?

No really, how do I build it? How do I install things on a machine that doesn't exist?

Step 1. Set up preliminaries (one time only)

Step 2. Make a Dockerfile

Step 3. Build the Docker

Command parameters

What to expect

Step 4. Test that it works

What to expect

How to verify your Docker

Step 5. Share your Docker image

What to expect

Tips Docker Development

Using your Docker on Terra

Was this article helpful?

That’s great, can you tell us why? (Click all that apply)

Thanks for your feedback, help us improve by telling us what you think could be better (click all that apply)

Comments