Docker/container overview

Allie Hajian
  • Updated

container is similar to a virtual machine (VM) and can be used to contain and execute all the software required to run a particular program or set of programs. The container includes an operating system (typically some flavor of Linux), plus any required software installed on top of the OS. It can be run as a self-contained virtual environment, making it easier to reproduce the same analysis on any infrastructure that supports running the container without having to go through the pain of identifying and installing all the software dependencies on your own laptop, cluster, or cloud environment. 

Cartoon diagram of three different Docker containers, each with a different set of installed packages and libraries. Text says 'A container encapsulates all of the software dependencies associated with running a program. Takes the guesswork out of running on different platforms.' 

Docker: a branded container

Docker is one of several brands of container systems. There are other brands, such as Singularity, but Docker is the most popular and widely used. Sometimes we say "a docker" instead of "a container" - similar to how "xerox" became a verb for "to copy" due to the dominance of the Xerox company. However, docker with a lowercase "d" is also the command-line program that you install on your machine to run Docker containers.

How to build and store containers with images and registries

A container is packaged as an image. Note: This has nothing to do with pictures; here the word "image" is refers to a special type of file. You know how sometimes when you need to install new software on your computer, the download file is called a "disk image"? That's because the file you download is in a format your operating system treats as if it was a physical disk on your machine. It's the same idea for a Docker image. Another way to distinguish between an image and a container is to think of the image as a snapshot of the container that isn't running.

An image can be distributed through one or more registries, which are repositories where users can store images privately or publicly in the cloud. Docker Hub is where teams from the Broad Institute publish most of their Docker images here). There are others, like Dockstore, which is specifically geared toward bioinformatics, and Google's Artifact Registry, which is Google's general-purpose container registry for use on Google Cloud.

Using Docker

On a local machine

One way to use Docker is on your laptop: First, you tell the docker program to download a container image (= a file) from a registry (e.g. Docker Hub). 

Then you tell it to initialize the container, which is equivalent to booting up a virtual machine. Once the container is running, you can run any software inside it that is installed on its system. For a concrete example, see this tutorial.

On a cloud-based machine

The other way to use Docker is on a cloud-based platform, like Terra. Workflows in Terra use Docker to distribute tools and applications. By referencing Docker images in a workflow configuration, anyone in the workspace can launch the same analysis without worrying about whether they are using the exact same environment or downloading the right applications.

Ensuring security and privacy when working in the cloud

If you're concerned about privacy, access to Docker images can be set through the registry. For example, if you want private images to be used in Docker Hub, add firecloud as a Collaborator so that Terra can pull the private image.

Next steps

To learn how to use Dockers/containers to analyze data in Terra, refer to the other articles in the Working with Containers section of the Support site. For example:

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.