Understanding Entity Types
FollowYou may have seen the term "entity types" in a workspace data table or workflow configuration card. This article will help you understand what entities are, and some of the more technical details of Terra's five default root entity types.
What is a "root entity"?
What are the default entity types?
How to associate data in different tables
Working with data tables resources
If you're new to Terra, you may first see reference to "entity types" in a workflow configuration form:
What's an "entity"? It's a piece of data, a "thing" |
|
According to the dictionary, an "entity" is "a thing with distinct and independent existence." In Terra, entities are pieces of information - almost like "variables" - traditionally used in workflow analysis. An entity is the type of primary data stored in a data table. It's also |
Example: sample entities in a sample table

This sample table includes genomic data (BAM and BAM index files) of various samples. Note that the first column is each sample's unique ID and the fourth column is the participant ID, also found in the participant table.
What's a workflow's "root entity" type? |
|
The "root entity" is the smallest piece of data a workflow can use as input. You select the root entity type of input when setting up a workflow. To learn more about how to configure a workflow, see this article. |
What are the default entity types?
In Terra, five default root entity types are predefined kind of data with specific relationships. They were originally developed based on the data typically used in cancer research analyses. Using default entity names helps standardize workflows where the end result depends on grouping and associating the data properly.
Default entity types |
|
|
![]() |
|
Sample: The most basic genomic data entity is the nucleotide sequence Sample set: A grouping of particular samples. You can use this entity Participant: An individual, such as someone in a study. Participant IDs Participant set: A grouping of particular participants. This entity type Pair set: A set of paired tumor and normal samples taken from the same |
Origins of the default entity types in Terra |
|
Terra's five default entity types are not native to the Workflow Description Terra now uses flexible entity types |
Default entity types streamline linking data in different tables
In a table, each distinct entity (sample or specimen or participant) has its own row and its own unique ID. However, data in different tables are often related, such as phenotypic data (in a participant table)and genomic data (in a sample table) for a single participant.
Terra can connect some types of data automatically if (and only if) you use the default entity types and make sure to upload tables in a specific order. See how to use the participant, sample, and pairs IDs to associate different kinds of data together.
Linking study participants to genomic data (samples)
Imagine a study with thousands of participants. You define the participants' unique IDs in a participant table. The first column includes the unique participant ID and additional columns include other information associated with the participant. Traditionally a participant table only included participant IDs, but today a participant table can also include phenotypic data, or other data related to that participant (as in the screenshot below):

Samples
Traditionally a sample table (or entity) included genomic data files plus additional data. The sample entity table shown below includes 1) the unique ID for each sample, 2) links to the sample data files in the cloud, and 3) the participant_ID (from the participant table). The participant ID links the sample back to the ID (as well as any phenotypic data) in the participant table.
Grouping samples with sets

To group the three samples in the above example together for analysis, you would use a sample_set entity (table):
Notice that the sample-set table only includes the name of the set and the sample IDs. In order to analyze the genomic data, Terra would reference the metadata in the samples entity table using the participant ID.
Linking participants and their tumor-normal pairs of sample data
The tables you would include in your workspace are (in order of how they must be loaded):
1. Participant table (notice this doesn't reference any other table or entity)
2. Sample table (notice this references the participant_id from the first table)
3. Pairs table (notice this references both the participant and sample IDs)
For a detailed explanation of configuring your workflow inputs in the Terra interface, see this article. To learn more about scripting in WDL, you can read our WDL user guide or check out the OpenWDL community that was formed to steward the WDL language specification and advocate its adoption.
Working with data tables resources |
|
How to modify, delete, and create tables For hands-on practice using and creating data tables To learn to use data tables for input to a workflow |
Comments
0 comments
Please sign in to leave a comment.