Workspaces are the building blocks of Terra, computational sandboxes where you can access and organize data and analysis tools, and collaborate. How do you fill them with the data and analysis tools you need? Share and collaborate securely and seamlessly? This article walks you through the features and functions of your workspace and outlines how to build the workspace you need with resources in Terra's Library.
- Intro to Terra workspaces
- Building workspaces using the Terra Library
- Data Library
- Data accessible through Terra
- Using data explorer to make custom datasets
- Showcase & Tutorials Library
- Example Workspaces
- Featured Workspaces
- Code & Workflows Library
- GATK4 Best Practices Workflows
- Data Library
Intro to Workspaces
A workspace is a dedicated space that keeps all the components of your project - data, metadata, and analysis tools as well as documentation and provenance - together. Components are organized in tabs in the workspace (see screenshot below):
Showcase workspace landing page
Dashboard: Documentation and workspace information
Here is where you describe the research project - what questions you are trying to answer, what kind of data and analyses you will use, etc. Documentation is important! Well-documented workspaces make it easy to share and collaborate. This tab also includes information about the workspace owner and creation date.
Data: Organize and access data-in-the-cloud
Terra uses tables, which are like built-in spreadsheets, to help access and organize the data you will use. Data can be in your workspace bucket, or in Google Cloud Storage or BigQuery. Tables connect workspace tools to the data with metadata links to the actual location in the cloud.
Notebooks: Interactive analysis
In the notebooks tab you can launch an in-app Jupyter Notebooks and interact with the data using Python or R.
Workflows: Pipelining analysis
In the workflows tab you will find workflows for batch analyse.These are the sorts of repetitive analyses that can be automated, such as what you would use to align sequencer reads. Workflows in Terra are written in in Workflow Description Language and are called WDLs.
Job History: Provenance and troubleshooting
In this tab you can check on the status of workflow submissions, and dig down into error logs for help with troubleshooting errors or submission failures. Your workspace also maintains a list of all batch submissions, for reproducibility and provenance.
Building workspaces using the Terra Library
Terra has three distinct libraries that can be used as resources to build the workspace you need for your analysis. To access the libraries, click the main menu icon (three horizontal lines) at the top left of any page and open the "Library" submenu. More detailed descriptions of Terra's data, showcase workspaces and workflows librarie are below.
Terra hosts both open- and controlled-access datasets. Select datasets have built-in functionality ("Data Explorers) for exploring - and creating customized subsets from - the data. Note that the numbers of available datasets - and the amount and type of data available in each - are growing, including public-access datasets.
Registering for Terra does not automatically grant access to all available data. To access restricted data requires you to be added to an Access policy for that resource. If you attempt to view data you don't have access to, you'll be prompted to request access, as shown below.
Datasets we are hosting as part of various awards include:
- AnVIL - CCDG, CMG, GTEx, eMERGE
- The Genotype-Tissue Expression (GTEx) Program established a data resource and tissue bank to study the relationship between genetic variation and gene expression in multiple human tissues.
- STAGE - TOPMed
- Trans-Omics for Precision Medicine (TOPMed), sponsored by the National Institutes of Health's National Heart, Lung, and Blood Institute (NHLBI), is a program to generate scientific resources to enhance our understanding of fundamental biological processes that underlie heart, lung, blood, and sleep disorders (HLBS).
- The Nurses' Health Study and Nurses' Health Study II are among the largest investigations into the risk factors for major chronic diseases in women.
- The Human Cell Atlas is made up of comprehensive reference maps of all human cells — the fundamental units of life — as a basis for understanding fundamental human biological processes and diagnosing, monitoring, and treating disease.
- The Encyclopedia Of DNA Elements (ENCODE) project aims to delineate all functional elements encoded in the human genome. To this end, ENCODE has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification.
- TCGA, TARGET
- The Cancer Genome Atlas (TCGA) is a dataset comprised of over two petabytes of genomic data, produced in a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI).
Using the Data Explorer
If you have permission to view a dataset, clicking "browse data" will take you to the data explorer, where you can create your own cohorts by choosing the subsets of data that are relevant to your research, and then export as a custom cohort entity. The cohort will show up as a "cohort" table in the Data tab.
In the clip above, we create a unique cohort by selecting patients with a few types of cancers, and then limiting the cohort by gender. Clicking "export" and "send" takes you to the import data screen where you can select the workspaces to send your custom cohort:
For step-by-step information on on how to use custom cohorts with SQL and BigQuery, see this article.
Showcase & Tutorials (Template Workspaces) Library
For curated, publicly available workspaces that highlight a variety of use-cases, try the Showcase and Tutorials section in the Library. These are intended to enable users to reproduce instructive results and learn established methodologies.
- Example (GATK) workspaces
- Showcase reproducible examples of GATK workflows and tools for general use
- Many contain tools developed at and supported by Broad
- Featured workspaces
- Tutorial workspaces
- Specific use cases based on published work
- Give users a chance to understand their peers' experimental design
Code & Workflows Library
The Code & Workflows section contains various tools and tasks that make up components of workflows. Users familiar with running workflows can use this repository to find workflow components to run individually or to string together using Workflow Description Language (WDL).
This section also includes links to other helpful open source workflow repositories. Look to the right of the Code & Workflows tab under "FIND ADDITIONAL WORKFLOWS."