Many of the datasets hosted by Terra include integrated Data Explorer interfaces, useful for generating and exporting custom cohorts. You can access the Data Explorers (for datasets that have them) from the Dataset Library by clicking "Browse data".
Note that you may need to request permission for the dataset before seeing the Data Explorer. Also note that anyone can see the 1000 Genomes Low Coverage Data Explorer.
This article walks through the steps to use a Data Explorer to generate and export a custom cohort for further analysis. If you have data hosted in BigQuery, scroll down to the bottom for links to how to create a Data Explorer in Terra.
Making and importing a custom cohort
Within Data Explorer, you can create a cohort and save it to your workspace by selecting the parameters you want to include from the parameter cards. Note that the number of participants that satisfy the conditions you select is listed at the top right.
Save the cohort to your workspace by clicking on the "Save cohort" button at the top right and following the prompts:
Using a custom cohort
After importing the cohort, your Terra workspace will contain two new data tables: cohort and BigQuery_table. The cohort entity contains a SQL query representing the cohort, and will have the same you gave it. This SQL query returns a list of participant ids.
Hint: Triple-click the SQL query table cell to select the entire SQL query.
The BigQuery_tables entity lists the BigQuery tables in your dataset. The cohort SQL query can be joined with these BigQuery tables for analysis.
To reopen the cohort in Data Explorer, select the cohort table, and "Open with" the Data Explorer tool:
To create a Python 2 or 3 notebook with this cohort, select the cohort table and "Open with" the Notebook tool:
Once you run all cells in the notebook, the BigQuery data for your cohort will be accessible in the notebook for analysis.
Creating a Data Explorer for your dataset
If your dataset lives in BigQuery, you can create a Data Explorer for your dataset. Please see https://github.com/DataBiosphere/data-explorer for more information.