Biomedical research in the cloud on Terra
FollowTerra's mission is “to help accelerate research by integrating data, analysis tools, and built-in security components to deliver frictionless research flows from data to results”. This sounds great, but how does it help you with your research? In this article, we’ll go over the types of analyses you can do, and how to get started on Terra.
Contents
Analyses you can do in Terra
- Pipelining with workflows
- Real-time analysis with Jupyter Notebooks
Take advantage of datasets in Terra's Data Library
Four steps to get started!
1. Register your account
2. Claim Google Credits
3. Explore additional workspaces
4. Join our community
Analyses you can do in a Terra workspace
Whether you're interested in running pipelines, performing statistics, or visualizing your data, you can access the tools you need in Terra. It all starts in the Terra workspace, a secure space in the cloud dedicated to your research project. A workspace helps you
- Keep track of data from multiple sources in the cloud in one place
- Access and store a variety of analysis tools from Dockstore and the Broad methods Repository
- Analyze project data with both batch and interactive modes
- Collaborate in a shared space with built-in security features
Learn more about workspaces in this guide.
Currently, Terra workspaces have two modes of analysis:
pipelining (workflows) and interactive analysis (Jupyter Notebooks).
**Coming soon, Terra will have additional applications like Galaxy and RStudio! If you have additional software needs, join our community to request a feature. |
Pipelining with workflows
You can perform whole pipelines - from preprocessing and trimming sequencing data to alignment and downstream analyses - using Terra workflows. Written in the human-readable Workflow Description Language (WDL), you can search for and import workflows into your workspace from Dockstore or the Broad Methods Repository.
You’ll find a range of workflows for analyzing and processing different types of sequencing data in Terra’s Showcase and Tutorials Library. Check out some of the available workflows in these curated workspaces to identify tools that match your research interests.
**Note: You will need to be registered on Terra to view Terra workspaces; if you haven't registered yet, follow the registration steps below.
Genomic analyses (GATK4 Best Practices workflows)
Single-cell RNA-seq analyses
- HCA_Optimus_Pipeline: Processing workflow for 10x Genomics datasets
- HCA_Smart-seq2_Multi_Sample_Pipeline: Processing workflow for Smart-seq2 datasets
- Cumulus: Workflows for large-scale single-cell and single-nuclei datasets
Epigenomic analyses
- DNA-Methylation-Preprocessing: Workflow for conducting methylation analyses
- ENCODE Tutorial: Workflow for ChIP-seq signal enrichment analyses
Ready for guided hands-on practice with workflows? Try the Workflows-QuickStart!
To learn how to set up, launch, and monitor workflows in Terra, try this tutorial workspace. Three hands-on exercises let you experience increasing amounts of complexity, from pre-packaged samples to a process more like a real-world analysis.
How long will it take to run? How much will it cost?
If you use the suggested data samples for analysis, it should take around 15-30 minutes per exercise. Total charges (Google Cloud service costs) for all three exercises are much less than $1 USD.
Real-time analysis and visualization with Jupyter Notebooks
Integrated Jupyter Notebooks contain code cells to run interactive analysis (in R or Python) and markdown cells to enable detailed documentation of your analyses and data.
Juptyer Notebooks enable you to:
- Run complex statistics and visualization interactively on large amounts of data
- Visualize results immediately
- Document and share your analyses with collaborators
Choose the virtual Notebook software you need
Jupyter Notebooks run on a virtual machine (VM). You can customize your VM’s installed software by selecting one of Terra's preinstalled Notebook cloud environments or choosing a custom environment by specifying a Docker container ("Docker"). Dockers ensure you and your colleagues analyze with the same software, making your results reproducible.
All Jupyter Notebook environments are preloaded with standard R and Python packages, but you can also choose an environment with:
- GATK4: Command-line tools for genomic analyses focused on variant discovery
- Hail: Python-based library for interacting with genomic data
- Bioconductor: R-based packages for analysis and visualization of genomic data
- Pegasus: A python package for single-cell analyses
- Custom software packages- Create your own environment following these step-by-step instructions
Choose the right performance at a cost that's right for you
To balance compute efficiency and cost, Terra lets you choose the compute power of your VM. Choose from three preconfigured powers, or select a custom option. Running especially large computations? Choose a Spark Cluster under “Custom” and run in parallel on the machines you specify.
To learn more about virtual runtime options, read this guide.
Example notebooks in Showcase workspaces
Explore Jupyter Notebooks-based analyses in Terra's Showcase workspaces. Select a workspace below and click the workspace Notebooks tab.
- Hail-Notebook-Tutorials: Practice genomic analysis with Hail
- Bioconductor: Explore two Notebooks dedicated to RNA-seq Bioconductor packages
- Cumulus: Try a Notebook featuring Pegasus software for single-cell analysis
Ready for guided Notebook experience? Try the Terra-Notebooks-QuickStart
In this hands-on Notebooks QuickStart tutorial workspace, you'll learn how to:
- Browse the Terra Data Library and specify a subset of data (cohort) for a study
- Import the cohort to a workspace data table
- Set up a Notebook cloud environment
- Analyze data in an interactive Jupyter Notebook
How long will it take to run? How much will it cost?
It will take around 5-10 minutes to explore and access data and 15-30 minutes to run each Notebook. Using the default Notebook configuration, the Terra Notebook cloud environment charges are $0.19/hour for Google Cloud service costs. It should cost much less than a dollar to run the Notebooks.
Take advantage of datasets in Terra's Data Library
Use Terra to search and access many public and controlled-access datasets. While you can upload any data to your workspace, you can also save money on data storage and egress by analyzing data from an existing repository without re-copying it.
|
|
---|---|
In Terra, "importing" data is a bit of a misnomer. When you "import" data from an existing repository, you are importing links to the data in the cloud, not the actual data. This metadata tells your workspace tools where that data is located. You don't actually need to copy the raw data into your workspace to analyze it. |
Practice with these data-focused Showcase workspaces
- Terra-Notebooks-QuickStart: Import public-access 1,000 Genomes data from BigQuery, a cloud data warehouse with built in machine learning
- Introduction to TCGA Dataset: Explore controlled-access TCGA data
- ENCODE Tutorial: Import an ENCODE ChIP-seq dataset
Four steps to get started!
1. Register your account
Register your Google or institutional account at app.terra.bio (this part is free). If you have a Google account, just click on the menu at the top left, or follow these step-by-step instructions. If you do not have a Google account, see how to set up a Google account with a non-gmail address.
2. Claim $300 in Google credits
$300 in Google cloud credits helps you explore Terra before committing your own grant dollars. See this step-by-step guide!
3. Explore additional workspaces
You aren’t limited to the workspaces suggested in this overview! Look through the entire Showcase and Tutorials Library of more than 30 examples for a variety of curated use-cases. Showcase workspaces include descriptions, downsampled data, and cost estimates so you can try different tools and gain the confidence to run on your own data.
4. Join our community
Join our community and see how a cloud-native platform - built by the Broad Institute of MIT and Harvard Data Sciences Platform and Verily Life Sciences - can transform the way you do bioinformatics research. Use the forum to post questions or search Terra Support for tutorials.
Want to see additional tools and software? Request a feature.
Comments
1 comment
Wow I can do so much with this platform! This is very exciting.
Please sign in to leave a comment.