Making, modifying, and deleting data tables
FollowNeed to modify, delete or generate a data table? Read on to learn how!
Contents
Setting the Stage - What kind of data do you have?
How to modify table entries or delete data rows (in the Terra UI)
How to add data and add/remove metadata to a table (in a spreadsheet editor)
How to make a data table from scratch (in a spreadsheet editor)
- Participant table
- Sample table
- Single entity table
- entity_set table
- Pairs table
- Workspace Data (i.e. workspace-level resources)
A warning about order when uploading tsv files
Setting the stage - What kind of data do you have?
Before you set up or modify your workspace tables, consider what data you have and how you'll use that data for analysis. This will help determine what data tables you want and what they should look like.
- What's the most fundamental piece of data you will be analyzing in a workflow?
This will be the name of your table in Terra and the "root entity type" of the table that includes the workflows input data files. You may categorize your data as specimens, samples, lanes of reads, and so on. To learn more about predefined data table types in Terra, see this article. - Sets of entities versus entities
Will you be analyzing single entities? Groups of single entities? Tumor-normal pairs of samples? Does your workflow take arrays of single files as input? Answers to these questions can affect whether you will use predefined tables like set table or pairs tables, which have specific formatting requirements.
How to modify table entries or delete data rows (in the Terra UI)
If you only need to change a handful of cells for data already in a workspace table
You can do this directly in the Terra interface, starting in the Data tab. For example, if you wanted to change the sample or participant IDs, or links to genomic data - or delete particular samples - in the table below:
To edit data in individual table cells (click for steps and screenshots)

To delete data in a workspace table (click for steps and screenshots)
2) Click the three vertical dots at the top and select "Delete Data"
To copy data in a table to another workspace (click for steps and screenshots)
2) Click the three vertical dots at the top and select "Export to Workspace" to select the workspace to copy to

How to add additional data (rows) or add/delete metadata (columns) in a table
Say you have an existing table, but need need to
- Add rows of data - like additional specimens to a specimen table
- Add or delete columns of metadata - such as adding an additional column with links to a BAM file
You will need to modify the table in a spreadsheet editor outside of Terra. Tables in Terra are formatted as either "tab-separated values" or "tab-delimited text" (depending on what spreadsheet editor you use).
Expand the sections below for more step-by-step instructions and screenshots.
1. Download the existing table
To download all rows
Select the Download all Rows button at the top of the table
To download particular rows
1. Select the rows to download
2. Click the three vertical dots to the right of the Download all Rows button and
select "Download as tsv"
2. Edit and save the tsv file using your favorite spreadsheet editor
2.1. Open in your favorite spreadsheet editor to edit. If you are adding columns of metadata, column headers will be the attribute name of the new column in the updated data table.
2.2. Save the file as "tab-separated values" or "tab-delimited text"
Your editor may give you a warning, but we assure you, it's fine! Also, Terra will completely ignore the name you give the file. It's the root entity in the first column header that determines the table name in the workspace.
|
|
---|---|
Depending on what spreadsheet editor you use, when you save in the proper format your spreadsheet may have either a ".tsv" or a ".txt" extension. Terra will accept either one. |
3. Upload the tsv file to your workspace
Click the blue "+" icon at the top right of the table column in the TABLE page of the workspace and follow the directions.
|
|
---|---|
When your tsv load file has the same entity (name) as a table already in the workspace, you may get an error message when you try to upload about overwriting data (see screenshot).
Note that Terra will only overwrite data rows with the same ID (in the first column). If the tsv (load) file includes different IDs, the data rows will be added to the existing table. |
How to make a data table from scratch (in a spreadsheet editor)
Workspace tables include rows and columns for organizing data, much like spreadsheet built into the data page. So it's no surprise that you can use a spreadsheet editor to create and upload a file - a "load file" - to generate a new table in your workspace.
A load file has to be in "tab-separated values" or "tab-delimited text" format (Terra recognizes both). Each row is a unique entity (participant, sample, etc.) and each column is a distinct attribute corresponding to that entity - (ie. the participant's sex, age, or height; the sample BAM, FASTA, etc.).
1. Make a table "load file" in a spreadsheet editor
Click the table types below for examples and required formatting
Each collapsed section below includes a description of the table type, formatting requirements and a screenshot of what the table will look like when uploaded to your workspace.
Participant table
Download a template participant tsv file here
What is it?
A participant table organizes participants in a study. The minimum participant table is only one column - the participant ID. This ID can be used in additional tables to associate samples, for example, with the right individuals.
Participant table in spreadsheet - Parts in red must be entered exactly as shown. You'll customize by adding your own values for parts in "< >".
entity:participant_id |
<participant1-id> |
<participant2-id> |
<participant3-id> |
Participant table in Terra
Sample table template and formatting
Download a template sample table here
What is it?
A sample, or other entity, table, keeps track of data - historically data used as input for a workflow. The minimum sample table includes an ID column and column for a data file (could be FASTQ, BAM, CRAM, etc. - whatever form your data are).
Sample table in spreadsheet - Parts in red must be entered exactly as shown! You'll customize by adding your own values for parts in "< >".
entity:sample_id | BAM |
<participant1-blood> | <gs://bucket-name/blood_sample_P1.bam> |
<participant1-spit> | <gs://bucket-name/spit_sample_P1.bam> |
<participant2-blood> | <gs://bucket-name/blood_sample_P2.bam> |
<participant2-spit> | <gs://bucket-name/spit_sample_P2.bam> |
Sample table in Terra
Entity table (i.e. specimen table, unicorn table)
Download a template single entity table here
The Terra model is flexible and you can use whatever name you wish for your table, as long as it follows the format entity:your-name_id.
Sample table in spreadsheet - Parts in red must be entered exactly as shown. You'll customize by adding your own values for parts in "< >".
entity:<unicorn>_id | <VCF> |
<Golden> | <gs://magic-bucket/golden.vcf> |
<Lightening> | <gs://magic-bucket/lightening.vcf> |
Sample table in Terra
Sets of data - sample_set table
Download a template entity_set table here
What is it?
A set table defines entities grouped into sets for analysis. An entity_set table always refers to entities in an entity table.
When will you use set tables?
Example 1: Using entity_set tables to analyze subsets of single entities in a dataset
You may have many samples in a data table, but want to do some testing with just three samples. You want to use the same three samples every time, and they're sprinkled throughout the sample table. To avoid having to manually select the exact samples you want every time you run a workflow, you can define subsets of the data in set table.If you choose a subset of input data and run a workflow on it, Terra will automatically generate a set of the entities you ran on. To learn how to generate a set automatically by running a workflow, see the Data Tables QuickStart Part 3.
Example 2: Using set tables for workflows that run on arrays
Set tables can also be used as inputs for some workflows that accept arrays as inputs
|
|
---|---|
The Optimus pipeline takes in multiple lanes of a sample, but outputs one file that corresponds to the sample, not the individual read lanes. Another example, familiar to cancer researchers, is the CNV_Somatic_Panel_workflow, which generates a single Panel of Normals (PoN) from a list of normal samples. The PoN is used in variant calling to filter out systemic errors that occur when reads are processed. |
For hands-on practice using tables to run array data, see the Data Tables QuickStart Part 3 and Part 4.
Example - set table in spreadsheet -The first column is the unique unique ID for each set and the second column is the entity_id (in a different - i.e. entity - table). There is a row for every member of a set.
Requirements
- The parts in red ("membership:" and "set_id") must be typed exactly as shown
- Customize with your own values inside the "< >"
- Your entity table must be "sample" (it's the table that contains links to the input data files)
membership:sample_set_id | sample |
<spit> | <participant1-spit> |
<spit> | <participant2-spit> |
<blood> | <participant1-blood> |
<blood> | <participant2-blood> |
Set table in Terra
To find the samples in each set, click on the link in the "samples" column (see highlighted box for the two samples in the "blood" set):
Pairs table
Download a template pairs table here
What is it?
Pairs tables are a specific type of data table used in cancer research, where somatic workflows input samples corresponding to both tumor and normal tissue. Note that the pairs table references the particpant_id and the sample_ids of the case and the control sample (from the sample table).
Example (in spreadsheet)
entity:pair_id | case_sample | control_sample | participant |
<HCC1143-2020> | <SM-74P4M> | <SM-74NEG> | <HCC1143> |
The header entries (in red) must be typed exactly as shown. Customize with your own values inside the "< >".
Table in Terra
Workspace-level resources - the Workspace Data table
Download a template Workspace Data table here
What is it?
The workspace-wide resource data table (Workspace Data) holds variables you might want to use in multiple workflow analyses - like the genomic reference sequence file, or a Docker container. Using the Workspace Data table lets you configure all at once and point to the resources from within the UI whenever you need them. Not only will you not need to look up reference file paths again, but if you update the resource files, you only need to update in one place.
Workspace Data in spreadsheet
The parts in red must be typed exactly as shown. Customize with your own values inside the "< >". Note that Terra will reorganize the files in alphabetical order.
workspace:<ref_fasta> | <ref_fasta_index> | <ref_dict> |
<gs://public-bucket/Homo_sapiens_assembly.38.fasta> | <gs://public-bucket/Homo_sapiens_assembly.38.fai> | <gs://public-bucket/Homo_sapiens_assembly.38.dict> |
Workspace Data table in Terra
The screenshot below is what you'll see when you upload the spreadsheet above to a Workspace data. The first column (circled column on the left) identifies what the file is. The other (circled on the right) includes a link to the file in a Google bucket:
2. Save file in "tab delimited text" or "tab-separated value" (tsv) format
Your editor may give you a warning, but we assure you, it's fine!
What will be the name of the table in your workspace?
It's worth noting that Terra will completely ignore the name you give the file. It's the root entity (in the first column header) that determines the table name in the workspace.
3. Upload to the workspace Data page
3.1. Click on the "+" sign in the blue circle at the top of the left TABLES column(in the orange rectangle in the screenshot below)
3.2. Drag or click to select your tsv file
3.3. Click the "upload" button
Once you've uploaded a load file to your workspace, you should see your data right away in its own table in the Data page. Click on the link (table name) to expand the table. A sample table with one sample (AH_spit) and one column of metadata (links to FASTQ files) would look like this:
Uploading interdependent tables in a workspace
|
|
---|---|
If data tables reference entities in another table, the dependent table needs to be uploaded first. The order is as follows ("A > B" means entity type A must be uploaded before entity type B):
- participants > samples Then, to associate data tables, make sure to check the boxTo link information in different tables, you must select “Create participant, sample, pair associations” at the prompt in the pop-up window that appears when you try to upload the files. |
Uploading interdependent tables example: Mutect2
Any tsv that references another tsv must be loaded after the primary reference. so the workflow is able to access data from the correct table. A good example is if you are running the workflow Mutect 2. The input table is a pairs table, but the fields in the pairs table references data stored in the sample table that's associated with participant IDs from a participant table.
Order to upload (Mutect 2)
- Participant tsv
- Sample tsv
- Pairs tsv
If you uploaded the tables out of order or without selecting that checkbox, the workflow will fail because it reads from the pair table but has no knowledge of the sample table from which the pair table is supposed to read.
Comments
0 comments
Please sign in to leave a comment.