The Library allows users to browse a number of resources available through Terra. These resources are divided into three categories - Datasets, Showcase & Tutorials, Code & Workflows. This article covers the following topics:
- Data accessible through Terra
- Using data explorer to make custom datasets
- Showcase & Tutorials
- Example Workspaces
- Featured Workspaces
- Code & Workflows
- GATK4 Best Practices Workflows
Terra has built-in functionality for exploring - and creating customized subsets from - available data. To browse through data available to you, use the menu button at the top left of the front page, and click on "Datasets" under "Library".
Simply registering for Terra does not automatically grant access to all available data. The user must be added to an Access policy for that resource. If a user attempts to view data they do not have access to, they will be prompted to request access, as shown below.
Datasets we are hosting as part of various awards:
- AnVIL - CCDG, CMG, GTEx, eMERGE
- The Genotype-Tissue Expression (GTEx) Program established a data resource and tissue bank to study the relationship between genetic variation and gene expression in multiple human tissues.
- STAGE - TOPMed
- Trans-Omics for Precision Medicine (TOPMed), sponsored by the National Institutes of Health's National Heart, Lung, and Blood Institute (NHLBI), is a program to generate scientific resources to enhance our understanding of fundamental biological processes that underlie heart, lung, blood, and sleep disorders (HLBS).
- The Nurses' Health Study and Nurses' Health Study II are among the largest investigations into the risk factors for major chronic diseases in women.
- The Human Cell Atlas is made up of comprehensive reference maps of all human cells — the fundamental units of life — as a basis for understanding fundamental human biological processes and diagnosing, monitoring, and treating disease.
- The Encyclopedia Of DNA Elements (ENCODE) project aims to delineate all functional elements encoded in the human genome. To this end, ENCODE has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification.
- TCGA, TARGET
- The Cancer Genome Atlas (TCGA) is a dataset comprised of over two petabytes of genomic data, produced in a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI).
Using Data Explorer
If you have permission to view a dataset, clicking "browse data" will take you to the "data explorer", where you can create your own cohorts by choosing the subsets of data that are relevant to your research, and then exporting that as a customized set of data.
In the clip above, we create a unique cohort by selecting patients with a few types of cancers, and then limiting the cohort by gender. Clicking "export" and "send" takes you to the import data screen where you can select to which of your own workspaces your custom cohort will be sent:
For information on how to use custom cohorts with SQL and BigQuery, and how to set up your own Data Explorer, see this article.
Showcase & Tutorials (Template Workspaces)
The Library - Showcase & Tutorials section is where users can now find publicly available workspaces. These workspaces are published with the intent of educating users in the many possible uses of the platform, as well as fostering collaboration by enabling users to reproduce instructive results in order to learn established methodologies. These workspaces can be broadly split into two categories:
- Example workspaces
- Used to showcase workflows and tools for general use
- Many contain tools developed at and supported by Broad
- Featured workspaces
- Specific use cases based on published work
- Give users a chance to understand their peers' experimental design
Code & Workflows
The Library - Code & Workflows section contains the various tools and tasks that make up components of workflows. Users familiar with running Workflows can come to this repository to find workflow components that can be run individually or strung together using Workflow Description Language (WDL).
This section also contains links to other helpful open source workflow repositories. Look to the right of the Code & Workflows tab under "FIND ADDITIONAL WORKFLOWS."