This step-by-step tutorial walks through building, testing, and sharing a Docker starting with a base image. A Docker is a "container" - a software package including code and all its dependencies - designed to run applications quickly and reliably from one computing environment to another. By eliminating problems that arise when different people use different hardware and software, Docker containers help make analyses easier to reproduce and share with collaborators.
Basic Docker concepts
A Docker, also called a base image, is like a new laptop with exactly what you need - no more and no less - preinstalled. Because it doesn't have extra stuff, it is lightweight, to run quickly and inexpensively. Technically, it's not the same thing as a virtual machine (VM) (it's smaller and lighter) but for our tutorial that doesn't matter - if it helps to think of it as a VM, go right ahead.
When you make a Docker "image", you're setting up a computer system in a virtual "container" that you can copy and run on any other machine - without worrying about what kind of system it's running.
Your Docker needs to include everything required to run your program.
Minimum Docker requirements
- An operating system
- Accessory software like Java, Python and/or R, with various libraries installed (usually)
How do I build a Docker without knowing a lot about system configuration?
When you get a new laptop, it already has an operating system and software preinstalled. You can add more by copying programs (i.e., java JAR files), or installing them (if they have to be compiled in place - i.e., Samtools). When you build a Docker, you can start with a base image that has some of what you want and add anything else that you need.
There are all sorts of base images available, designed for different purposes with more - or less - software already bundled in. If you don't find exactly what you need in a base image already out, any Docker can serve as a base image on which you can add your own software or libraries or packages (by running an installation command, or copying program files). The Docker has a file system - just like your laptop - so you can create directories and put program files or dependencies in specific locations as appropriate.
No really, how do I build it? How do I install things on a machine that doesn't exist?
That's where the Dockerfile comes in. The Dockerfile is a recipe for building the Docker that outlines every step a system administrator takes to install and configure this virtual laptop.
To make a Dockerfile, you write out, line by line, what the virtual sysadmin should do. Then you tell the Docker program (which you have to install first on your real-world laptop) to build the Docker container as specified. Then you can run it, share it, and bask in the joy of having done something complicated.
Enough talk -- let's make a Docker!
Step 1. Set up preliminaries (one time only)
For this exercise, let's make a Docker that has Java 8, Picard tools and R with the ggplot2 library installed.
1.1. Get and install Docker - You need to have the Docker program itself installed on your local machine (laptop, etc.); see Install Docker and test that it works for guidance.
1.2. Get the Picard toolkit - We'll use this java command-line program as an example of a piece of software you might want to "Dockerize", i.e., for which you might want to build a container image. Get the latest version of the Picard toolkit and save the
picard.jar file (anywhere you want -- we'll put it in the right place later).
1.3. Docker responsibly - remember that when you put software on a Docker and publish it, you're responsible for checking that you comply with the licenses of everything included in your image!
To learn more about creating safe and secure Docker containers, see Creating safe and secure Docker images.
Step 2. "Make a Dockerfile" tutorial
2.1. In a terminal, make a new working directory for this tutorial project, and navigate to that directory.
2.2. Copy the
picard.jar file (see prerequisites above) to this directory.
2.3.Create the Dockerfile. This is a text file named
Dockerfile (no extension), containing the following (you can copy and paste the text below):
# Specify the base image -- here we're using one that bundles the OpenJDK version of Java 8 on top of a generic Debian Linux OS FROM openjdk:8-jdk-slim #Set the working directory to be used when the docker gets run WORKDIR /usr # Do a few updates of the base system and install R (via the r-base package) RUN apt-get update && \ apt-get upgrade -y && \ apt-get install -y r-base # Install the ggplot2 library and a few other dependencies we want to have available RUN echo "r <- getOption('repos'); r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile RUN Rscript -e "install.packages('reshape')" RUN Rscript -e "install.packages('gplots')" RUN Rscript -e "install.packages('ggplot2')" # Add the Picard jar file (assumes the jar file is in the same directory as the Dockerfile, but you could provide a path to another location) COPY picard.jar /usr/picard.jar
Step 3. Build the Docker
In the terminal, run the following command:
docker build -t <username>/<repo>:<tag> .
<username>is your Docker Hub username.
<repo>is the name of the repository where you will store the Docker (which can be an existing repo or the name of a repo that is created when you push the Docker to Docker Hub).
<tag>is a keyword or version number that helps identify a specific image.
Don't forget the `.` at the end of the command This tells Docker to build in the working directory.
What to expect
This will run for a few minutes and output a lot of logs to the terminal. Most of the output is what you'd see if you were to install a Linux operating system on your machine, followed by R, and the libraries specified in the Dockerfile. Eventually, the mad scrolling gobbledygook will stop and you should see something like "
Successfully built 084e949b60cb".
Step 4. Test that it works
Technically, you achieved your goal of building a Docker image -- but let's test that it works before celebrating, by running this command.
docker run -it <username>/<repo>:<tag>
Don't forget the `-it` flag!! The
it makes it in interactive mode; if you omit it, nothing will happen because your Docker isn't set to do anything by default.
What to expect
If everything works, your terminal command prompt will change to
root@ca9af9b92f3d:/usr# (but with a different number after the
@). At this point, you're at the helm of your shiny new virtual laptop!
How to verify your Docker
To test that you can use Picard tools in your Docker, run this command inside your Docker session:
java -jar picard.jar
This should output the list of tools available in that release of Picard. When you're done, shut it down by running
exit. For more detailed instructions on how to use tools inside a Docker container (including mounting a volume to access the filesystem), see Run GATK in a Docker container.
Step 5. Share your Docker image
To push your shiny new container image to Docker Hub, run the following.
docker push <username>/<repo>:<tag>
What to expect
Again, this may take a few minutes as the size of the Docker is around 400 Mb. If you make more Dockers with some of the same components, those components will copy over from what you've already pushed. So updating Dockers is usually faster than the initial process.
See this tutorial for instructions to push the image to the Google Container Repository (GCR) instead.
You can see the Docker made for this tutorial,
vdauwera/tutorial_example:picard-2.9.0, at https://hub.docker.com/r/vdauwera/tutorial_example/.
There're many options to refine your Docker's setup (including adding labels, environment variables, making it run commands when it boots up) which you can read about in the Docker documentation -- or search for them on Google or Stackoverflow. As they say, YMMV.