Overview: Terra Data Catalog

Allie Cliffe
  • Updated

You'll discover diverse biomedical datasets with hundreds of thousands of subjects in the cloud. But are all these data useful?  The Terra Data Catalog can help you find and analyze the data you need for your study quickly and easily. 

Terra's integrated Data Catalog is designed to make it easier to find and analyze relevant data

  • Quickly search and filter datasets hosted by Terra.
  • Once you find interesting data, export it directly to your Terra workspace to use in a workflow or interactive analysis.

To access the Data CatalogGo to the main navigation menu > Library > Datasets and toggle New  Catalog ON

Streamlining the exploratory process

  • Target dataset metadata instead of specific dataset research fields with search and filter
  • Access and search datasets that reside in different systems (currently Data Repo and workspaces - and eventually external repositories) in one place
  • Request access to controlled data (coming soon)
  • Similar experience regardless of dataset and consortium and source 

Data-Catalog_Landing-page_Screen_shot.png

How to search and filter datasets 

Target dataset metadata instead of specific dataset research fields. In this example, searching for "leukemia" surfaces one dataset.
Data-Catalog_Search-datasets_Screen_shot.png

Example filters 

  • Access type
  • Data use policy
  • Data modality
  • File type
  • Disease

How to filter

The oval beside each filter includes the number of datasets with that filter. Click on the oval to filter (the oval will turn green). 
Data-Catalog_Filter-datasets_Screen_shot.png

All relevant datasets are listed in the All datasets column, with some basic information:

dataset name | consortium | number of subjects | data modality | last updated

Filtering datasets example

Filter-datasets-in-TDC_Screen-capture.gif

When using multiple filters 

Note: Filters can use either "and" or "or" logic. 

Selecting filters across categories enforces “and” logic (datasets must satisfy every condition)

If you select “Granted” as the "Access type" and the consortium "ClinVar Annotations, you will only see ClinVar datasets for which you have permission to access.

Selecting filters within a category enforces “or” logic (datasets satisfy at least one condition):

If you select disease type “Adrenal carcinoma” and “Bladder”, your datasets will include everything with either epigenomic or proteomic data.

How to explore datasets quickly

Clicking on the dataset name will surface the following.

  1. Dataset overview
    Includes access type, donor size, sample size, data modality, data type, and file counts. Also contact information and data contributors, as well as the cloud infrastructure and region where the primary data files are stored.
  2. Data preview
    Lets you drill down into the specifics of data included in the dataset as well as request access to controlled data.
  3. Export to Terra 
    You can export the data to a new or existing workspace for analysis

1. Screenshot of TARGET Acute Myeloid Leukemia (AML) Project in the Terra Data Catalog

Terra-Data-Catalog_TARGET-dataset-details-to-preview-or-export_Screenshot.png

2. Screenshot of dataset preview example (participant table)
Terra-Data-Catalog_TARGET-dataset-preview_Screenshot.png

3. Screenshot of export to Terra destination example
Terra-Data-Catalog_Export-TARGET-dataset_prepared-data-destination_Screenshot.png

What to expect

Data will be delivered as one or more tables in the workspace data page.

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.