Biomedical research in the cloud on Terra

Allie Hajian

Terra's mission is “to help accelerate research by integrating data, analysis tools, and built-in security components to deliver frictionless research flows from data to results”. How can Terra help you with your research? This article summarizes the types of analyses you can do, how to access datasets in the Terra Data Library, and how to get started on Terra.

Project data and tools together in one workspace 

Whether you're interested in running pipelines, performing statistics, or visualizing your data, you can access and manage all the tools and data you need in a Terra workspace dedicated to your research project. A workspace helps you

  • Keep track of data from multiple sources in the cloud in one place.
  • Access and store bulk analysis workflow tools from Dockstore and the Broad Methods Repository.
  • Analyze data with both batch and interactive analysis modes.
  • Collaborate in a shared space with built-in security features.

Learn more about workspaces in Working with workspaces

Analyses you can do in a Terra workspace
Pipelining (workflows) |  Interactive analysis  |  Jupyter Notebooks  | RStudio  |  Galaxy

To learn more - and for additional resources - scroll down for an overview.
Or click on the links above for more detailed documentation.

If you have additional software needs, join our community to request a feature

Pipelining with workflows

Perform whole pipelines on Terra -  from preprocessing and trimming sequencing data to alignment and downstream analyses - using workflows. Written in the human-readable Workflow Description Language (WDL), you can search for and import workflows into your workspace from Dockstore or the Broad Methods Repository.

You’ll find a range of workflows for analyzing and processing different types of sequencing data in Terra’s Showcase and Tutorials Library. Check out some of the available workflows in these curated workspaces to identify tools that match your research interests.

Note: You will need to be registered on Terra to view Terra workspaces.
If you haven't registered yet, follow the registration steps below.

Genomic analyses (GATK4 Best Practices workflows)

Single-cell RNA-seq analyses

Epigenomic analyses

Ready for guided hands-on practice?  Try the Workflows-QuickStart! To learn how to set up, launch, and monitor workflows in Terra, try the Workflows tutorial workspace. Three hands-on exercises let you experience running an increasingly more real-life pipeline, from pre-packaged samples to a process more like a real-world analysis. 

How long will it take to run? How much will it cost?
If you use the suggested data samples for analysis, it should take around 10 to 20 minutes per exercise. Total charges (Google Cloud service costs) for all three exercises are much less than $1 USD.

Real-time analysis and visualization - Jupyter notebooks, RStudio and Galaxy

Integrated tools allow you to run complex statistics and visualization in real time on large amounts of data, and Visualize results immediately. Click the title links below for more detailed documentation.

Jupyter notebooks
Document and share your analyses with collaborators inside the Terra platform. Integrated Jupyter Notebooks contain code cells to run interactive analysis (in R or Python) and markdown cells to enable detailed documentation of your analyses and data. 

RStudio
If you're looking for a richer IDE experience for R development than Jupyter notebooks
- Includes variable explorer, R Markdown editor, debugger, terminal
- Support for launching RShiny apps
- First class Bioconductor support
- Git integration

Galaxy on Terra
Looking for additional tools that are accessible, reproducible, transparent, and community-centered? See Galaxy interactive environments to learn more about
  -  How to launch a Galaxy instance
  - Navigating the Galaxy interface
  - How to import data to your Galaxy instance
  - How to install additional tools in the tools panel

Customize with the VM software you need

Interactive analysis tools run on a Cloud Environment, which includes a virtual machine (VM) and storage (VM memory plus a Detachable Persistent Disk). You can customize the software installed on your VM by selecting one of Terra's preinstalled Cloud Environments or choosing a custom environment by specifying a Docker container ("Docker") or using a startup script. Dockers ensure you and your colleagues analyze with the same software, making your results reproducible.

All Jupyter Notebook environments are preloaded with standard R and Python packages, but you can also choose an environment with common biomedical packages, such as:

  • GATK4: Command-line tools for genomic analyses focused on variant discovery
  • Hail: Python-based library for interacting with genomic data
  • Bioconductor: R-based packages for analysis and visualization of genomic data
  • Pegasus: A python package for single-cell analyses
  • Custom software packages- Create your own environment following these step-by-step instructions

To learn more about virtual Cloud Environment options, read this guide

Choose the right performance at a cost that's right for you

To balance compute efficiency and cost, Terra lets you choose the compute power of your VM. Choose from three preconfigured powers, or select a custom option. Running especially large computations? Choose a Spark Cluster under “Custom” and run in parallel on the machines you specify. 

Example notebooks in Showcase workspaces

Explore Jupyter Notebooks-based analyses in Terra's Showcase workspaces. To see a read-only copy, select a workspace below and click the workspace Notebooks tab.

To run the notebook, make your own copy (clone) of the workspace

  • Hail-Notebook-Tutorials: Practice genomic analysis with Hail
  • Bioconductor: Explore two Notebooks dedicated to RNA-seq Bioconductor packages 
  • Cumulus: Try a Notebook featuring Pegasus software for single-cell analysis

Ready for a guided Notebook tutorial? Try the Terra-Notebooks-QuickStart. In this hands-on Notebooks QuickStart tutorial workspace, you'll learn how to:
    - Set up your notebook Cloud Environment VM
    - Run a Jupyter notebook (Jupyter 101 tutorial)
    - Import data from different locations into the notebook VM for analysis (four optional notebooks)
       - A data table
       - The workspace bucket
       - The Terra Data Library
       - Google Cloud Storage or BigQuery

How long will it take to run? How much will it cost?
The interactive tour will take about 15 minutes and the Jupyter notebooks 101 can take up to 25-30 minutes (depending on how many of the exercises you do). Each of the optional analysis notebooks takes only a few minutes to "run all". Using the default Notebook configuration, the Terra Notebook cloud environment charges are $0.19/hour for Google Cloud service costs. It should cost much less than a dollar to run the Notebooks. 

Take advantage of datasets in Terra's Data Library

Access to large numbers of large datasets is one of the advantages of working in Terra. You can use Terra to search and access many public and controlled-access datasets. While you can upload any data to your workspace, you can also save money on data storage and egress by analyzing data from an existing repository without re-copying it.

Where's the data? In Terra, "importing" data is a bit of a misnomer. When you "import" data from an existing repository, you are importing links to the data in the cloud, not the actual data.

This metadata tells your workspace tools where that data is located. You don't actually need to copy the raw data into your workspace to analyze it.

Understanding data storage options in Terra
To learn more about the Terra ecosystem and where your data are stored, see this article

Practice with these data-focused Showcase workspaces

Four steps to get started on Terra

1. Register your account

Register your Google or institutional account at app.terra.bio (this part is free). If you have a Google account, just click on the menu at the top left, or follow these step-by-step instructions. If you do not have a Google account, see how to set up a Google account with a non-gmail address.

2. Claim $300 in Google credits 

$300 in Google cloud credits helps you explore Terra before committing your own grant dollars. See this step-by-step guide!

3. Explore showcase and tutorial workspaces

You aren’t limited to the workspaces suggested in this overview! Look through the entire Showcase and Tutorials Library of more than 30 examples for a variety of curated use-cases. Showcase workspaces include descriptions, downsampled data, and cost estimates so you can try different tools and gain the confidence to run on your own data.

4. Join our community

Join our community and see how a cloud-native platform - built by the Broad Institute of MIT and Harvard Data Sciences Platform and Verily Life Sciences - can transform the way you do bioinformatics research. Use the forum to post questions or search Terra Support for tutorials.

Was this article helpful?

6 out of 7 found this helpful

Have more questions? Submit a request

Comments

1 comment

Please sign in to leave a comment.