Using the Bioconductor Docker image in Terra

Anton Kovalsky
  • Updated

The Bioconductor Docker image is one of the base images available when running interactive analysis apps in Terra. This guide will introduce the Terra Bioconductor image and how to use it in a Jupyter Notebook analysis.

Introduction to Terra's Bioconductor Image

Bioconductor is a suite of open source tools, primarily R packages, developed for the statistical analysis of high-throughput genomic data. The Terra-Jupyter-Bioconductor image (a specific set of software installed on a virtual machine) is an extension of the Terra-Jupyter-R image that contains preloaded Bioconductor packages.

For a list of all packages and software dependencies, see the Terra-Docker GitHub repository.

What is an image?An image is a file that acts as a set of instructions (a template) to build a Docker container in which your notebook will run. The image includes all software and dependencies that will be preinstalled.  

What's in the Bioconductor image?

Bioconductor packages in the Terra-Jupyter-Bioconductor image include:

AnnotationHub: a web resource that allows you to search genomic files from other common web resources (UCSC, Ensembl, etc.).

DESeq2: an RNA-seq analysis package that tests differential gene expression using a negative binomial generalized linear model.

ensembldb: a package that fetches transcript-centric annotations from Ensembl.

ExperimentHub: a web resource that allows you to search curated experiments, publications, etc.

GenomicAlignments: a container for storing and manipulating short genomic alignments.

GenomicFeatures: a suite of tools that helps manipulate and track transcript-related annotations. This tool lets you download the genomic location of transcripts, exons, and coding sequences (cds) from the UCSC genome browser or BioMart.

scRNAseq: a package that contains gene counts data from a collection of public scRNAseq datasets.

ShortRead: a suite of tools that allow you to manipulate and assess the quality of FASTQ files.

SingleCellExperiment: a package for single-cell analysis that defines an S4 class (a lightweight container for genomics data) used for storing dimensionality reduction results or alternative single-cell analysis features such as spike-in transcripts or antibody tags.

scran: a collection of functions for single-cell analyses.

How to access the Terra Bioconductor Image

To use Bioconductor in a Jupyter Notebook, you need to set the Application configuration in the workspace Jupyter Cloud Environment to the R/Bioconductor Docker image. Using this image preinstalls the R/Bioconductor software in your Jupyter app.

If you're interested in looking inside the Docker file (or building your own custom Docker file), you can read more details in this README in the Terra-Docker GitHub repository.

1. To install the Bioconductor environment (when Jupyter is running or paused), click on the Jupyter logo in the sidebar of your workspace screen. If Jupyter isn't running, click on the cloud icon

2. Click on the gear icon (Environment Settings) to surface the Jupyter Cloud Environment details. 

3. Select the Customize button at the bottom of the Jupyter Cloud Environment pane.

4. From the Application configuration dropdown, select R / Bioconductor
R-Bioconductor-image-in-Jupyter-Cloud-Environment_Screen_shot.png

5. Customize the Cloud Compute Profile and Persistent Disk

6. When you're done configuring your Jupyter Cloud Environment, click the Update button (or Create, if you're starting Jupyter from scratch) at the bottom right of the form. 

After setting your Jupyter Cloud Environment, launch your R Jupyter Notebook or create a new one using the instructions in Starting and customizing your Jupyter app.

Make sure you've got the package

Try one of the following to make sure your Cloud Environment is using the R/Bioconductor package. 

Access R help page

Once you’ve launched the notebook, you can run a quick confidence check by accessing the R help page of a Bioconductor function using the “?” syntax below.

Screen_Shot_2019-12-17_at_11.55.00_PM.png

Confirm available packages

You can also try loading a package you expect to be available, such as 'GenomicAlignments'.

Screen_Shot_2019-12-17_at_11.52.08_PM.png

How to upload more Bioconductor packages

If your research requires additional Bioconductor packages, you can install them in your R Jupyter Notebook using Bioconductor’s BiocManager package, which comes preinstalled with the Terra Bioconductor image.

Use the command BiocManager::install().

Example: Install the edgeR Bioconductor package

1. From the Analyses tab, navigate to your Jupyter Notebook or create a new Notebook with the language set to R.

2. In a cell block of the Jupyter Notebook, execute the following command: BiocManager::install(‘edgeR’)

Screen_Shot_2019-12-19_at_10.13.22_AM.png

3. To check that your Notebook appropriately installed the package, you can execute the command library(“edgeR”) in the code block. If no error message appears, you've successfully installed the package in the Jupyter Cloud Environment.
Screen_Shot_2019-12-19_at_10.13.29_AM.png

Additional Docker resources

You can read more about Dockers and customizing Docker images in the following articles:

Creating safe and secure custom Docker images

Docker tutorial: Custom cloud environments for Jupyter notebooks

Working with project-specific environments in Terra

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.