A Docker is a "container" - a software package including code and all its dependencies - designed to run applications quickly and reliably from one computing environment to another. By eliminating problems that arise when different people use different hardware and software, Docker containers help make analyses easier to reproduce and share with collaborators.
Terra supports custom Dockers outfitted with the exact libraries and dependencies you need for a particular analysis. Read on for more details about how to ensure your collaborators get the same results by building, testing, and sharing custom Dockers.
Basic Docker concepts
When you make a Docker "image", you're setting up a computer system in a virtual "container" that you can then copy and run on any other machine - without worrying about what kind of system it's running. Your Docker will need to include everything required to run your program; at the minimum it has to have an operating system, and typically you need accessory software like Java, Python and/or R, with various libraries installed. Technically, it's not the same thing as a virtual machine (it's smaller and lighter) but for the purposes of our tutorial that doesn't really matter, so if it helps you to think of it as a VM, go right ahead.
How do I build a Docker without having to know a lot about system configuration?
When you get a new laptop, it already has an operating system and a bunch of software preinstalled. You can add more by either copying programs (i.e. java JAR files), or installing them (if they have to be compiled in place - i.e. Samtools). A Docker, also called a base image, is like a new laptop with exactly what you need - no more and no less - preinstalled. Because it doesn't have extra stuff, it is lightweight, to run quickly and inexpensively.
There are all sorts of base images available, designed for different purposes with more, or less, software already bundled in. If you don't find exactly what you need in a base image already out, any Docker can serve as a base image on which you can add your own software or libraries or packages (by running an installation command, or copying program files). The Docker has a file system - just like your laptop - so you can create directories and put program files or dependencies in specific locations as appropriate.
No really, how do I build it? How do I install things on a machine that doesn't exist?
That's where the Dockerfile comes in. The Dockerfile is a recipe for building the Docker that outlines every step a system administrator would take to install and configure this virtual laptop for you. To make a Dockerfile, you write out, line by line, what the virtual sysadmin should do. Then you tell the Docker program (which you do have to install first on your real-world laptop) to build the Docker container as specified. Then you can run it, share it, and bask in the joy of having done something complicated.
Enough talk -- let's make a Docker!
Step 1. Set up preliminaries (one time only)
For this exercise, let's make a Docker that has Java 8, Picard tools and R with the ggplot2 library installed.
1.1. Get and install Docker - You'll need to have the Docker program itself installed on your local machine (laptop etc.); see Install Docker and test that it works for guidance.
1.2. Get the Picard toolkit - We'll use this java command-line program as an example of a piece of software you might want to "Dockerize", i.e. for which you might want to build a container image. Get the latest version of the Picard toolkit and save the
picard.jar file (anywhere you want -- we'll put it in the right place later).
1.3. Docker responsibly - remember that when you put software on a Docker and publish it, you're responsible for checking that you are complying with the licenses of everything you've included in your image! Also, to learn more about creating sage and secure Docket containers, see Creating safe and secure Docker images.
Step 2. "Make a Dockerfile" tutorial
2.1. In a terminal, make a new working directory for this tutorial project, and navigate to that directory.
2.2. Copy the
picard.jar file (see prerequisites above) to this directory.
2.3. Create the Dockerfile. This is a text file named
Dockerfile (no extension), containing the following (you can copy and paste the text below):
# Specify the base image -- here we're using one that bundles the OpenJDK version of Java 8 on top of a generic Debian Linux OS FROM openjdk:8-jdk-slim #Set the working directory to be used when the docker gets run WORKDIR /usr # Do a few updates of the base system and install R (via the r-base package) RUN apt-get update && \ apt-get upgrade -y && \ apt-get install -y r-base # Install the ggplot2 library and a few other dependencies we want to have available RUN echo "r <- getOption('repos'); r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile RUN Rscript -e "install.packages('reshape')" RUN Rscript -e "install.packages('gplots')" RUN Rscript -e "install.packages('ggplot2')" # Add the Picard jar file (assumes the jar file is in the same directory as the Dockerfile, but you could provide a path to another location) COPY picard.jar /usr/picard.jar
Step 3. Build the Docker
In the terminal, run the following command:
docker build -t <username>/<repo>:<tag> .
<username>is your Dockerhub user name
<repo>is the name of the repository where you will store the Docker (which can be an existing repo or the name of a repo that will be created when you push the Docker to Dockerhub)
<tag>is a keyword or version number that you want to attach to identify a specific image.
Don't forget the `.` at the end of the command This tells Docker to build in the working directory.
This will run for a few minutes and output a lot of logs to the terminal. Most of the output is what you'd see if you were installing a Linux operating system on your machine, followed by R, and the libraries specified in the Dockerfile. Eventually the mad scrolling gobbledygook will stop and you should see something like "
Successfully built 084e949b60cb".
Step 4. Test that it works
At this point, you've technically achieved your goal of building a Docker image -- but let's test that it works before celebrating, by running this command.
docker run -it <username>/<repo>:<tag>
Don't forget the `i`!! The
i makes it in interactive mode; if you omit it, nothing will happen because your Docker isn't set to actually do anything by default.
If everything worked, your terminal command prompt will change to
root@ca9af9b92f3d:/usr# (but with a different number after the
@). At this point you are at the helm of your shiny new virtual laptop!
You can further test that you can use Picard tools in your Docker by running this command inside your Docker session:
java -jar picard.jar
which should output the list of tools available in that release of Picard. When you're done you can shut it down by running
exit. For more detailed instructions on how to use tools inside a Docker container (including mounting a volume to be able to access the filesystem, see Run GATK in a Docker container.
Step 5. Share your Docker image
Sharing is caring, so let's push your shiny new container image to Dockerhub by running:
docker push <username>/<repo>:<tag>
Again this may take a few minutes as the size of the Docker is around 400 Mb. The good thing is that if you make additional Dockers with some of the same components, those components will be copied over from what you've already pushed. So updating Dockers is usually faster than the initial process.
See this tutorial for instructions to push the image to the Google Container Repository (GCR) instead.
And that's it! You can see the Docker made for this tutorial,
vdauwera/tutorial_example:picard-2.9.0, at https://hub.docker.com/r/vdauwera/tutorial_example/.
There are lots of options to refine your Docker's setup (including adding labels, environment variables, making it run commands when it boots up) which you can read about in the Docker documentation -- or you can search for them on Google or Stackoverflow, which I find generally more helpful. As they say, YMMV.