Overview: Terra Data Catalog

Allie Cliffe
  • Updated

Diverse biomedical datasets with hundreds of thousands of subjects have the potential to transform health and medicine. But all the data in the world (in the cloud) is only useful if you can find and analyze the data you need for your study quickly and easily. 

Terra's integrated Data Catalog is designed to make it easier to find and analyze relevant data

  • Quickly search and filter datasets hosted by Terra.
  • Once you find the data you’re interested in, export it directly to your Terra workspace to use in a workflow or interactive analysis.

To access the Data CatalogGo to the main navigation menu > Library > Data and toggle the new Beta Data Catalog

Streamlining the exploratory process

  • Target dataset metadata instead of specific dataset research fields with search and filter
  • Access and search datasets that reside in different systems (currently Data Repo and workspaces - and eventually external repositories) in one place
  • Request access to controlled data (coming soon)
  • Similar experience regardless of dataset and consortium and source 

Data-Catalog_Landing-page_Screen_shot.png

How to search and filter datasets 

Target dataset metadata instead of specific dataset research fields. In this example, searching for "leukemia" surfaces one dataset.
Data-Catalog_Search-datasets_Screen_shot.png

Filters 

  • Access type
  • Data use policy
  • Data modality
  • File type
  • Disease

How to filter

The oval beside each filter includes the number of datasets with that filter. Click on the oval to filter (the oval will turn green). 
Data-Catalog_Filter-datasets_Screen_shot.png

All relevant datasets are listed in the All datasets column, with some basic information:

dataset name | consortium | number of subjects | data modality | last updated

Filtering datasets example

Filter-datasets-in-TDC_Screen-capture.gif

When using multiple filters 

Note that filters can use either "and" or "or" logic. 

Filters that use “and” logic (datasets must satisfy every condition)

(i.e. if you select “Granted” as the "Access type", you will only see datasets for which you have permission)

  • Access type

Filters that use “or” logic (datasets satisfy at least one condition):

(i.e. if you select disease type “Adrenal carcinoma” and “Bladder”, you will get datasets that include either epigenomic or proteomic data)

  • Data use policy
  • Data modality
  • File type
  • Disease

How to explore datasets quickly

Clicking on the dataset name will surface 

  1. Dataset overview
    Includes access type, donor size, sample size, data modality, data type, and file counts. Also contact information and data contributors, as well as the cloud infrastructure and region where the primary data files are stored.

  2. The option to preview the data (coming soon)
    Lets you drill down into the specifics of data included in the dataset as well as request access to controlled data.

When you find data that interests you, link to a new or existing workspace (3) for analysis (not yet available). 

Data-Catalog_Dataset-details_Screen_shot.png

Preview data (not yet available)
Data-Catalog_Preview-dataset_Screen_shot.png

Link to a workspace (not yet available)
Data-Catalog_Link-to-workspace_Screen_shot.png

Data will be delivered as one or more tables in the workspace data page.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.