Learn how to link entity tables in Terra for the analysis of paired tumor and normal samples.
Overview: Pair tables for tumor-normal samples
Although Terra has two main types of data tables (entity and set), it can make a special association of predefined participant, sample, and pair data tables for the analysis of tumor-normal samples. These tables are just entity tables, but when used together with the appropriate naming convention, Terra can link them to facilitate the analysis of paired tumor-normal samples taken from the same patient. Somatic workflows that require pairs of tumor and normal samples accept data in pair tables by default.
How does a pair table work?
A pair table is used to specify control and case samples for a particular participant (HCC1141 below).
For Terra to link a pair table to a participant and its samples in downstream workflows, you need to create both a participant table that lists the participants and a sample table (shown below) that lists the genomic samples (for tumor and for normal) for each participant.
The pair table should reference the participant_ids used in the participant table and the sample_ids used in the sample table. To create a pair table from scratch, follow the example shown below.
Example: pair.tsv in a spreadsheet
- The header entries in red (i.e., "entity:pair_id", "case_sample", "control_sample" and "participant" - shown above) must be typed exactly as shown
- Customize with your own pair, sample, and participant IDs.
- Remember the "sample" and "participant" IDs are taken from the "sample" and "participant" tables!
Example: Pair table in Terra
Uploading pair and associated tables
For Terra to link participant, sample, and pair tables, they must be uploaded in the appropriate order.
Required upload order
To associate the tables, you must upload in the correct order and select “Create participant, sample, pair associations” at the prompt in the pop-up window that appears when you try to upload the files.
Uploading pair tables example: Mutect2
A good example of when to use pair tables is when you're running the workflow Mutect2. The input table is a pair table, but the fields in the pair table reference data stored in the sample table that's associated with participant IDs from a participant table.
Order to upload (Mutect 2)
If you uploaded the tables out of order or without selecting that checkbox, the workspace will give you a table loading error. Additionally, the workflow will fail because it reads from the pair table but has no knowledge of the sample table from which the pair table is supposed to read.