On May 22, 2019, members of the Broad Institute community participated in a Terra workshop as part of the BroadE workshop series. The goal of the workshop was to empower members of the Broad community, including those who may be entirely new to cloud computing, to use Terra to access data, run analysis tools, and collaborate -- all in a secure and scalable environment.
Workshop synopsis
Focus on your science: How to work and collaborate easily and securely on the cloud with Terra
This workshop will get you up and running with Terra, the new scalable platform for biomedical research developed at the Broad Institute in collaboration with Verily Life Sciences. (If you’ve seen or used FireCloud, think of Terra as the new and improved user interface that makes doing research easier than before!). We will first cover the basics of doing research on the cloud and introduce you to Terra and the Data Biosphere.
You’ll then get hands-on with Terra, learning how to access research data, run analysis tools, and seamlessly collaborate, all in a secure and scalable environment. We will guide you through running example workflows, interacting with Jupyter Notebooks, and working with the BigQuery data warehouse to help accelerate your own project in Terra.
We will also show you how to manage data, security, and billing in Terra so that you can relax and focus on your science.
Workshop sessions and materials
0. Introduction to the workshop, Terra, and the Data Biosphere
What is the Data Biosphere, and how does Terra fit into this ecosystem? We set the stage with the story behind Terra before you get hands-on in the application.
Instructor: Robert Majovski, Lead Educator, Data Sciences Platform, Broad Institute
Materials: Slides; Video
1. Interactive analysis with Jupyter Notebooks in Terra
We give you a crash-course on interactive analysis in Terra with Jupyter Notebooks.
Instructor: Anton Kovalsky, Science Writer, Data Sciences Platform, Broad Institute
Materials: Slides; Video
2. Hands-on with Jupyter Notebooks in Terra
Now is your chance to try out notebooks in Terra as we guide you through how to install the R environment and software packages you need for the next lesson in this workshop.
Instructor: Anton Kovalsky, Science Writer, Data Sciences Platform, Broad Institute
Materials: Video; Jupyter Notebook
3. BigQuery: Interacting with structured data sets
We teach you how to analyze data in real time in Terra using Google BigQuery and Jupyter notebooks.
Instructor: Allie Hajian, Science Writer, Data Sciences Platform, Broad Institute
Materials: Slides; Video; Terra workspace
4. Billing, cost management, and security in Terra
Definitely not as exciting as your research, but important to know, are how to set up billing accounts, manage costs, and collaborate securely.
Instructor: Alex Baumann, Senior Principle Software Engineer, Data Sciences Platform, Broad Institute
Materials: Slides; Video
5. Running a real workflow in Terra (CRAM to BAM)
This exercise on how to run genomic data workflows in Terra gives you a chance to try out a quick and easy step in genomic data processing, converting the format of sequencing data files.
Instructor: Kate Noblett, Senior Project Coordinator, Data Sciences Platform, Broad Institute
Materials: Video (we're working on it, there was a technical issue with the recording that needs to be resolved); Terra workspace
6. Understanding and using Docker
Knowing how to use Docker containers in Terra can help you scale your compute and enable more reproducible research. Here we introduce you to Docker and tell you a bit about how to containers are used in running both workflows and notebooks in Terra.
Instructor: Adelaide Rhodes, Senior Computational Associate, Data Sciences Platform, Broad Institute
Materials: Slides; Video
7. Terra case study: Tetralogy of Fallot
We take you in-depth with a real study that was reproduced on Terra, showing you how the data, workflows and analysis were organized to reproduce the scientific findings, illustrating how you can perform an end-to-end analysis in Terra.
Instructor: Anton Kovalsky, Science Writer, Data Sciences Platform, Broad Institute
Materials: Slides; Video; Terra workspace
Additional resources
The Data Biosphere
Terra Resources
- Documentation https://support.terra.bio/hc/en-us
- Ask questions through the button in the upper left hamburger menu, or on the community forum
- Make a feature request here
Running workflows on Terra
Terra's Jupyter Notebooks Environment
- Part I - Key Components
- Part II - Key Operations
- Dos and Don'ts - How not to lose data output files or collaborator edits in a notebook
Jupyter Notebooks Resources
- Jupyter Notebooks 101
- Jupyter Notebooks for data Science (extensions, widgets, and more!)
- Jupyter notebooks cheat sheet
- Mastering markdown
- Markdown cheat sheet
R Resources
Data wrangling, visualization, and analysis
Developing and (finding the best) R packages
- Advanced R
- R packages
- Finding the best R package amongst the available options: https://www.rdocumentation.org/
BigQuery Resources
- Comprehensive BigQuery documentation
- BigQuery best practices (controlling costs, optimizing Query performance, optimizing storage)
- See the giant list of analytical functions on the right-hand side nav bar here.
- Using client libraries (and your favorite programming language) with BigQuery
- BigQuery YouTube videos (from the Google Cloud Platform developers)
Google Cloud Platform
Chrome
- Setting up Chrome Profiles