Terra's mission is “to help accelerate research by integrating data, analysis tools, and built-in security components to deliver frictionless research flows from data to results”. How can Terra help you with your research? This article summarizes the types of analyses you can do, how to access datasets in the Terra Data Library, and how to get started on Terra.
Project data and tools together in one workspace
Whether you're interested in running pipelines, performing statistics, or visualizing your data, you can access and manage all the tools and data you need in a Terra workspace dedicated to your research project. A workspace helps you
- Keep track of data from multiple sources in the cloud in one place.
- Access and store bulk analysis workflow tools from Dockstore and the Broad Methods Repository.
- Analyze data with both batch and interactive analysis modes.
- Collaborate in a shared space with built-in security features.
Learn more about workspaces in Working with workspaces
Analyses you can do in a Terra workspace
Pipelining (workflows) | Interactive analysis | Jupyter Notebooks | RStudio | Galaxy
To learn more - and for additional resources - scroll down for an overview.
Or click on the links above for more detailed documentation.
If you have additional software needs, join our community to request a feature
Pipelining with workflows
Perform whole pipelines on Terra - from preprocessing and trimming sequencing data to alignment and downstream analyses - using workflows. Written in the human-readable Workflow Description Language (WDL), you can search for and import workflows into your workspace from Dockstore or the Broad Methods Repository.
You’ll find a range of workflows for analyzing and processing different types of sequencing data in Terra’s Showcase and Tutorials Library. Check out some of the available workflows in these curated workspaces to identify tools that match your research interests.
Note: You will need to be registered on Terra to view Terra workspaces.
If you haven't registered yet, follow the registration steps below.
Genomic analyses (GATK4 Best Practices workflows)
- GATK4 Exome-Analysis-Pipeline
- GATK4 Whole-Genome-Analysis-Pipeline
- GATK4 Mitochondria-SNPs-Indels-hg38
Single-cell RNA-seq analyses
- HCA_Optimus_Pipeline: Processing workflow for 10x Genomics datasets
- HCA_Smart-seq2_Multi_Sample_Pipeline: Processing workflow for Smart-seq2 datasets
- Cumulus: Workflows for large-scale single-cell and single-nuclei datasets
- DNA-Methylation-Preprocessing: Workflow for conducting methylation analyses
- ENCODE Tutorial: Workflow for ChIP-seq signal enrichment analyses
Ready for guided hands-on practice? Try the Workflows-QuickStart! To learn how to set up, launch, and monitor workflows in Terra, try the Workflows tutorial workspace. Three hands-on exercises let you experience running an increasingly more real-life pipeline, from pre-packaged samples to a process more like a real-world analysis.
How long will it take to run? How much will it cost?
If you use the suggested data samples for analysis, it should take around 10 to 20 minutes per exercise. Total charges (Google Cloud service costs) for all three exercises are much less than $1 USD.
Real-time analysis and visualization - Jupyter notebooks, RStudio and Galaxy
Integrated tools allow you to run complex statistics and visualization in real time on large amounts of data, and Visualize results immediately. Click the title links below for more detailed documentation.
Document and share your analyses with collaborators inside the Terra platform. Integrated Jupyter Notebooks contain code cells to run interactive analysis (in R or Python) and markdown cells to enable detailed documentation of your analyses and data.
If you're looking for a richer IDE experience for R development than Jupyter notebooks
- Includes variable explorer, R Markdown editor, debugger, terminal
- Support for launching RShiny apps
- First class Bioconductor support
- Git integration
Galaxy on Terra
Looking for additional tools that are accessible, reproducible, transparent, and community-centered? See Galaxy interactive environments to learn more about
- How to launch a Galaxy instance
- Navigating the Galaxy interface
- How to import data to your Galaxy instance
- How to install additional tools in the tools panel
Customize with the VM software you need
Interactive analysis tools run on a Cloud Environment, which includes a virtual machine (VM) and storage (VM memory plus a Detachable Persistent Disk). You can customize the software installed on your VM by selecting one of Terra's preinstalled Cloud Environments or choosing a custom environment by specifying a Docker container ("Docker") or using a startup script. Dockers ensure you and your colleagues analyze with the same software, making your results reproducible.
All Jupyter Notebook environments are preloaded with standard R and Python packages, but you can also choose an environment with common biomedical packages, such as:
- GATK4: Command-line tools for genomic analyses focused on variant discovery
- Hail: Python-based library for interacting with genomic data
- Bioconductor: R-based packages for analysis and visualization of genomic data
- Pegasus: A python package for single-cell analyses
- Custom software packages- Create your own environment following these step-by-step instructions
To learn more about virtual Cloud Environment options, read this guide.
Choose the right performance at a cost that's right for you
To balance compute efficiency and cost, Terra lets you choose the compute power of your VM. Choose from three preconfigured powers, or select a custom option. Running especially large computations? Choose a Spark Cluster under “Custom” and run in parallel on the machines you specify.
Example notebooks in Showcase workspaces
Explore Jupyter Notebooks-based analyses in Terra's Showcase workspaces. To see a read-only copy, select a workspace below and click the workspace Notebooks tab.
To run the notebook, make your own copy (clone) of the workspace.
- Hail-Notebook-Tutorials: Practice genomic analysis with Hail
- Bioconductor: Explore two Notebooks dedicated to RNA-seq Bioconductor packages
- Cumulus: Try a Notebook featuring Pegasus software for single-cell analysis
Ready for a guided Notebook tutorial? Try the Terra-Notebooks-QuickStart. In this hands-on Notebooks QuickStart tutorial workspace, you'll learn how to:
- Browse the Terra Data Library and specify a subset of data (cohort) for a study
- Import the cohort to a workspace data table
- Set up a Notebook cloud environment
- Analyze data in an interactive Jupyter Notebook
How long will it take to run? How much will it cost?
It will take around 5-10 minutes to explore and access data and 15-30 minutes to run each Notebook. Using the default Notebook configuration, the Terra Notebook cloud environment charges are $0.19/hour for Google Cloud service costs. It should cost much less than a dollar to run the Notebooks.
Take advantage of datasets in Terra's Data Library
Access to large numbers of large datasets is one of the advantages of working in Terra. You can use Terra to search and access many public and controlled-access datasets. While you can upload any data to your workspace, you can also save money on data storage and egress by analyzing data from an existing repository without re-copying it.
Where's the data? In Terra, "importing" data is a bit of a misnomer. When you "import" data from an existing repository, you are importing links to the data in the cloud, not the actual data.
This metadata tells your workspace tools where that data is located. You don't actually need to copy the raw data into your workspace to analyze it.
Understanding data storage options in Terra
To learn more about the Terra ecosystem and where your data are stored, see this article.
Practice with these data-focused Showcase workspaces
- Terra-Notebooks-QuickStart: Import public-access 1,000 Genomes data from BigQuery, a cloud data warehouse with built in machine learning
- Terra-Data-Tables-QuickStart: Learn how to use data tables to organize, access and analyze data - including sets of data - in the cloud.
- Introduction to TCGA Dataset: Explore controlled-access TCGA data
- ENCODE Tutorial: Import an ENCODE ChIP-seq dataset
Four steps to get started on Terra
1. Register your account
Register your Google or institutional account at app.terra.bio (this part is free). If you have a Google account, just click on the menu at the top left, or follow these step-by-step instructions. If you do not have a Google account, see how to set up a Google account with a non-gmail address.
2. Claim $300 in Google credits
$300 in Google cloud credits helps you explore Terra before committing your own grant dollars. See this step-by-step guide!
3. Explore showcase and tutorial workspaces
You aren’t limited to the workspaces suggested in this overview! Look through the entire Showcase and Tutorials Library of more than 30 examples for a variety of curated use-cases. Showcase workspaces include descriptions, downsampled data, and cost estimates so you can try different tools and gain the confidence to run on your own data.
4. Join our community
Join our community and see how a cloud-native platform - built by the Broad Institute of MIT and Harvard Data Sciences Platform and Verily Life Sciences - can transform the way you do bioinformatics research. Use the forum to post questions or search Terra Support for tutorials.