Accessing and analyzing custom cohorts with Data Explorer

melchang
  • Updated

Many of the datasets hosted by Terra include integrated Data Explorer interfaces, useful for generating and exporting custom cohorts. You can access the Data Explorers (for datasets that have them) from the Dataset Library by clicking "Browse data".  If you have data hosted in BigQuery, scroll down to the bottom for links to how to create a Data Explorer in Terra.  

Note: You may need to request permission for the dataset before seeing the Data Explorer. Also note that anyone can see the 1000 Genomes Low Coverage Data Explorer.

Making and importing a custom cohort

Within Data Explorer, you can create a cohort and save it to your workspace by selecting the parameters you want to include from the parameter cards. Note: The number of participants that satisfy the conditions you select is listed at the top right.

Save the cohort to your workspace by clicking on the "Save cohort" button at the top right and following the prompts.
Screen capture short video clip of selecting criteria for a cohort in the 1,000 genomes data explorer, clicking 'save cohort' button, filling in form with name of cohort, and clicking the 'save' button to save cohort to the workspace.

Using a custom cohort

After importing the cohort, your Terra workspace will contain two new data tables: cohort and BigQuery_table. The cohort entity contains a SQL query representing the cohort, and will have the same you gave it. This SQL query returns a list of participant ids.

Hint: Triple-click the SQL query table cell to select the entire SQL query.
Screenshot of expanded cohort table with seven cohorts. The full SQL query for each cohort is visible in the right-most 'Query' column.

The BigQuery_tables entity lists the BigQuery tables in your dataset. The cohort SQL query can be joined with these BigQuery tables for analysis. 
Screenshot of Data page with cohort table highlighted (left column) and three BigQuery data sources in the table

To reopen the cohort in Data Explorer, select the cohort table, and "Open with" the Data Explorer tool.
Screen capture video clip of selecting a cohort from the cohort table, clicking the 'open with' button at the top right of the table, and selecting 'Data Explorer' from the menu

To create a Python 2 or 3 notebook with this cohort, select the cohort table and "Open with" the Notebook tool.
Screen capture video clip of selecting a cohort from the cohort table, clicking the 'open with' button at the top right of the table, and selecting 'Notebook' from the menu

Once you run all cells in the notebook, the BigQuery data for your cohort will be accessible in the notebook for analysis. 

Creating a Data Explorer for your dataset

If your dataset lives in BigQuery, you can create a Data Explorer for your dataset. Please see https://github.com/DataBiosphere/data-explorer for more information.

Was this article helpful?

1 out of 1 found this helpful

Comments

0 comments

Please sign in to leave a comment.