Build a workspace using data, showcase, and tools Library resources

Allie Cliffe
  • Updated

Learn how to access Terra-hosted datasets, curated showcase workspaces, and a workflow repository to build your workspace.

How to access the Terra Libraries

Click the main navigation menu (three horizontal lines icon) at the top left of any page and open the "Library" submenu.  

Screenshot of Terra Libraries in terra's main navigation menu. The Library menu is highlighted with an orange arrow and the library's sections (Data, Featured Workspaces, and Code & Workflows) are highlighted with an orange box.

Data Library

Terra hosts both open- and controlled-access datasets. Some datasets have built-in data explorers for browsing, filtering, and searching the data. Explorers also let you use selection criteria to create custom subsets (cohorts) of data. Note: The number of available datasets - and the amount and type of data available in each - are growing, including public-access datasets.

Accessing controlled data from the Terra Data LibraryRegistering for Terra does not mean you can access all the hosted data. To access restricted data, you must be added to an Access policy for that resource. When you try to view data you cannot access, you'll be prompted to request access.

Screen capture of going to the data library from the main navigation menu and clicking on an example controlled-access dataset. When the curser clicks on the browse dataset button, a warning message comes up that you don't have access.

Terra Library datasets

Datasets have different explorers for filtering data to make custom cohorts. Note: Since the Data Library is always expanding, you may find additional datasets available on Terra.

AnVIL - CCDG, CMG, GTEx, eMERGE

The Centers for Common Disease Genomics (CCDG) dataset includes sequencing results from over 65,000 participants, with the aim to comprehensively identify rare risk and protective variants contributing to multiple common disease phenotypes. Data from AnVIL's Genotype-Tissue Expression (GTEx) Program are also available through the public GTEx workspace.

TOPMedTrans-Omics for Precision Medicine (TOPMed)

Sponsored by the National Institutes of Health's National Heart, Lung, and Blood Institute (NHLBI), TOPMed is a program to generate scientific resources to enhance our understanding of fundamental biological processes that underlie heart, lung, blood, and sleep disorders (HLBS).

Human Cell Atlas (HCA)

The Human Cell Atlas aggregates comprehensive reference maps of all human cells — the fundamental units of life — as a basis for understanding fundamental human biological processes and diagnosing, monitoring, and treating disease.

The Encyclopedia Of DNA Elements (ENCODE)

The ENCODE project aims to delineate all functional elements encoded in the human genome. To this end, ENCODE has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification.

TCGA, TARGET

The Cancer Genome Atlas (TCGA) is a dataset comprised of over two petabytes of genomic data, produced in a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI).

Using a Data Explorer

For some datasets in the Terra Library, clicking browse data will open a data explorer if you have permission to view the dataset. Here you can create your own subset of data (cohort) by choosing criteria relevant to your research. When you export the subset to your workspace, it will appear as a Cohort table in the Data tab. 

Screencapture of the Terra explorer for the 1000 genomes dataset. Clicking on different facets (for example, 'female' and 'African') filters the data to isolate the samples that match those criteria. To save the samples in the filtered dataset, click 'create cohort' and provide a unique name. On the next screen, you can choose to save the cohort information to an existing workspace or a new workspace.

In the clip above, we create a unique cohort by filtering for female participants from the African super-population.

Clicking Save Cohort > Save reveals an import data screen where you can select the workspace to send your custom cohort.

Screenshot of the import data menu, with options to import cohort information into an existing or new workspace.

For step-by-step instructions on on how to build custom cohorts with SQL and BigQuery, see Accessing and analyzing custom cohorts with Data Explorer.

Featured Workspace and Tutorials Library

One of the best ways to get started in Terra is to explore the curated Featured Workspaces in the Library (access from the dropdown in the main navigation menu at the top left). These workspaces span a variety of use cases to show how peers design experiments, a useful reference when designing your own project. They are standardized for completeness and ease of use.

Classes of Showcase Workspaces

  • Tutorial workspaces
  • Specific analysis tools (i.e., WDLs, Jupyter Notebooks, Hail, Bioconductor)
  • Experimental strategies (i.e., GWAS, Exome analysis, RNA-seq)
  • Scientific domains (Cancer, infectious diseases, single-cell, immunology)

They're great as templates or to help reproduce instructive results and learn established methodologies - and you don't have to be logged into Terra to see them (though you will have to log in to make your own copy so you can work in it!).

Using Showcase workspaces as templates

You should find enough detail in the workspace description to analyze the included sample data. Or, if you prefer, cost and time estimates give you the confidence to run on your own data.

Screenshot of the featured workspace library with the GATK4 Germline Variant Calling best practics workspace, the Peat demo workspace, and the inferncnv workspace in the main section and the menu for filtering showcase workspaces on the left with categories for getting started, analysis tools, experimental strategy, data generation technology, scientific domain, datasets, utilities, and projects.

Code and Workflows Library

The Code and Workflows Library contains GATK Best Practices workflows and links to both Dockstore and the Broad Methods Repository (in the grey panel on the far right of the page). You'll find workflow components to run individually or to string together in these workflow repositories.

Screenshot of the code and workflows library with several GATK Best practices workflow cards in the main section and an option to find additional workflows from dockstore and the Broad Methods repository in a card on the right.

How to import workflows to your workspace

All workflows in Terra use Workflow Description Language (WDL)

Was this article helpful?

1 out of 1 found this helpful

Comments

0 comments

Please sign in to leave a comment.