A Docker is a "container" - a software package including code and all its dependencies - designed to run applications quickly and reliably from one computing environment to another. By eliminating problems that arise when different people use different hardware and software, Docker containers help make analyses easier to reproduce and share with collaborators. Read on for more details about how to ensure your collaborators get the same results by building, testing, and sharing custom Dockers in Terra.
Basic Docker concepts
When you make a Docker "image", you're setting up a computer system in a virtual "container" that you can copy and run on any other machine - without worrying about what kind of system it's running. Your Docker needs to include everything required to run your program; at the minimum, it has to have an operating system, and typically, you need accessory software like Java, Python and/or R, with various libraries installed. Technically, it's not the same thing as a virtual machine (VM) (it's smaller and lighter) but for our tutorial that doesn't matter - if it helps to think of it as a VM, go right ahead.
How do I build a Docker without knowing a lot about system configuration?
When you get a new laptop, it already has an operating system and software preinstalled. You can add more by copying programs (i.e., java JAR files), or installing them (if they have to be compiled in place - i.e., Samtools). A Docker, also called a base image, is like a new laptop with exactly what you need - no more and no less - preinstalled. Because it doesn't have extra stuff, it is lightweight, to run quickly and inexpensively.
There are all sorts of base images available, designed for different purposes with more, or less, software already bundled in. If you don't find exactly what you need in a base image already out, any Docker can serve as a base image on which you can add your own software or libraries or packages (by running an installation command, or copying program files). The Docker has a file system - just like your laptop - so you can create directories and put program files or dependencies in specific locations as appropriate.
No really, how do I build it? How do I install things on a machine that doesn't exist?
That's where the Dockerfile comes in. The Dockerfile is a recipe for building the Docker that outlines every step a system administrator takes to install and configure this virtual laptop. To make a Dockerfile, you write out, line by line, what the virtual sysadmin should do. Then you tell the Docker program (which you have to install first on your real-world laptop) to build the Docker container as specified. Then you can run it, share it, and bask in the joy of having done something complicated.
Enough talk -- let's make a Docker!
Step 1. Set up preliminaries (one time only)
For this exercise, let's make a Docker that has Java 8, Picard tools and R with the ggplot2 library installed.
1.1. Get and install Docker - You need to have the Docker program itself installed on your local machine (laptop, etc.); see Install Docker and test that it works for guidance.
1.2. Get the Picard toolkit - We'll use this java command-line program as an example of a piece of software you might want to "Dockerize", i.e., for which you might want to build a container image. Get the latest version of the Picard toolkit and save the
picard.jar file (anywhere you want -- we'll put it in the right place later).
1.3. Docker responsibly - remember that when you put software on a Docker and publish it, you're responsible for checking that you comply with the licenses of everything included in your image! Also, to learn more about creating safe and secure Docket containers, see Creating safe and secure Docker images.
Step 2. "Make a Dockerfile" tutorial
2.1. In a terminal, make a new working directory for this tutorial project, and navigate to that directory.
2.2. Copy the
picard.jar file (see prerequisites above) to this directory.
2.3.Create the Dockerfile. This is a text file named
Dockerfile (no extension), containing the following (you can copy and paste the text below):
# Specify the base image -- here we're using one that bundles the OpenJDK version of Java 8 on top of a generic Debian Linux OS FROM openjdk:8-jdk-slim #Set the working directory to be used when the docker gets run WORKDIR /usr # Do a few updates of the base system and install R (via the r-base package) RUN apt-get update && \ apt-get upgrade -y && \ apt-get install -y r-base # Install the ggplot2 library and a few other dependencies we want to have available RUN echo "r <- getOption('repos'); r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile RUN Rscript -e "install.packages('reshape')" RUN Rscript -e "install.packages('gplots')" RUN Rscript -e "install.packages('ggplot2')" # Add the Picard jar file (assumes the jar file is in the same directory as the Dockerfile, but you could provide a path to another location) COPY picard.jar /usr/picard.jar
Step 3. Build the Docker
In the terminal, run the following command:
docker build -t <username>/<repo>:<tag> .
<username>is your Dockerhub user name
<repo>is the name of the repository where you will store the Docker (which can be an existing repo or the name of a repo that is created when you push the Docker to Dockerhub)
<tag>is a keyword or version number that you want to attach to identify a specific image.
Don't forget the `.` at the end of the command This tells Docker to build in the working directory.
This will run for a few minutes and output a lot of logs to the terminal. Most of the output is what you'd see if you were to install a Linux operating system on your machine, followed by R, and the libraries specified in the Dockerfile. Eventually, the mad scrolling gobbledygook will stop and you should see something like "
Successfully built 084e949b60cb".
Step 4. Test that it works
Technically, you achieved your goal of building a Docker image -- but let's test that it works before celebrating, by running this command.
docker run -it <username>/<repo>:<tag>
Don't forget the `i`!! The
i makes it in interactive mode; if you omit it, nothing will happen because your Docker isn't set to do anything by default.
If everything works, your terminal command prompt will change to
root@ca9af9b92f3d:/usr# (but with a different number after the
@). At this point, you're at the helm of your shiny new virtual laptop!
To test that you can use Picard tools in your Docker, run this command inside your Docker session:
java -jar picard.jar
This should output the list of tools available in that release of Picard. When you're done, shut it down by running
exit. For more detailed instructions on how to use tools inside a Docker container (including mounting a volume to access the filesystem), see Run GATK in a Docker container.
Step 5. Share your Docker image
Let's push your shiny new container image to Dockerhub by running:
docker push <username>/<repo>:<tag>
Again, this may take a few minutes as the size of the Docker is around 400 Mb. If you make more Dockers with some of the same components, those components will copy over from what you've already pushed. So updating Dockers is usually faster than the initial process.
See this tutorial for instructions to push the image to the Google Container Repository (GCR) instead.
And that's it! You can see the Docker made for this tutorial,
vdauwera/tutorial_example:picard-2.9.0, at https://hub.docker.com/r/vdauwera/tutorial_example/.
There're many options to refine your Docker's setup (including adding labels, environment variables, making it run commands when it boots up) which you can read about in the Docker documentation -- or search for them on Google or Stackoverflow. As they say, YMMV.