Making, modifying, and deleting data tables

Allie Hajian
  • Updated

Need to modify, delete or generate data in a workspace table? Read on to learn how!

Setting the stage - What kind of data do you have?

Before you set up or modify your workspace tables, consider what data you have and how you'll use that data for analysis. This will help determine what data tables you want and what they should look like.

  • What's the most fundamental piece of data you will be analyzing?
    This will be the name of your table in Terra and the "root entity type" of the table that includes input data files for a workflow. You may categorize your data as specimens, samples, lanes of reads, and so on. To learn more about predefined data table types (the default genomic data model) in Terra, see this article.   

  • Sets of entities versus entities
    Will you be analyzing single entities? Groups of single entities? Tumor-normal pairs of samples? Does your workflow take arrays of single files as input? Answers to these questions can affect whether you will use predefined tables like set table or pairs tables, which have specific formatting requirements.

How to modify table entries or delete data rows (in the Terra UI)

This video shows how to change a handful of cells for data already in a workspace table:

You can do this directly in the Terra interface, starting in the Data tab. For example, if you wanted to change the sample or participant IDs, or links to genomic data - or delete particular samples - in the table below:

Modifying-data-tables_Screen_shot.png

To edit data in individual table cells

If your workspace already includes a data table with at least the number of rows you need, you can edit individual cells only by clicking on the pencil icon in the cell you want to change:
S11a_Edit_sample-table_Screen_Shot.png

To delete data (rows) in a workspace table

1. Check the box to the left of the row(s) you want to delete, or (to delete the entire table) select all rows from the dropdownData-tables-deleting-rows-or-tables_Screen_shot.png
2. Click the three vertical dots at the top and select "Delete Data"

To copy data in a table to another workspace

1. Starting in the workspace you want to copy from, select the rows you want to copy (or all rows, if you want the entire table) Data-tables-deleting-rows-or-tables_Screen_shot.png
2. Click the three vertical dots at the top and select "Export to Workspace" to select the workspace to copy to

How to add additional data (rows) or add/delete metadata (columns) in a table

If you have an existing table, but need to

1. Add rows of data - like additional specimens to a specimen table.
2. Add or delete columns of metadata - such as adding an additional column with links to a BAM file.Modifying-tables_Add_rows-columns_Screen_shot.png
You will make these modifications in a spreadsheet editor outside of Terra. Expand the sections below for more step-by-step instructions and screenshots. 

1. Download the existing table

To download all rows
1.1. Select the Download all Rows button
at the top of the tablemodifying-tables_Download-all-rows_Screen_shot.png
To download particular rows
1.1. Select the rows to downloadModifying-tables_Download-as-tsv_Screen_shot.png

1.2. Click the three vertical dots to the right of the "Copy Page to Clipboard" button and select "Download as TSV"

2. Delete the entire table in the workspace

2.1. Select all rows by clicking "All" in the dropdown menu at the top left of the table
Delete-data-table-columns_Select-all_Screen_shot.png

2.2. Click the three vertical dots to the right of the "Copy Page to Clipboard" and select "Delete Data"
Delete-data-table-columns_Delete-data_Screen_shot.png

2.3. Confirm by clicking the blue "Delete" button
If there are many rows in your table, you will need to scroll down to the end of the form to see the blue button. 
Delete-data-table-columns_Confirm-delete_Screen_shot.png

3. Edit and save the tsv file using your favorite spreadsheet editor

3.1. Open in your favorite spreadsheet editor and edit like any file. Note that cells can only include alphanumeric characters, "-" and "_" and no spaces. All formatting will be lost when you save the file in tsv format. 

 Note that if you are adding columns of metadata, column headers will be the attribute name of the new column in the updated data table. 

3.2. Save the file as "tab-separated values" or "tab-delimited text" 
Your editor may give you a warning, but we assure you, it's fine! Also, Terra will completely ignore the name you give the file. It's the name ("root entity") in the first column header that determines the table name in the workspace. 
Data-QuickStart_Part2_Save-as-Tab-delimited-text.png

G0_tip-icon.png


A note about "tsv" versus "txt" file extensions

  Depending on what spreadsheet editor you use, when you save in the proper format your spreadsheet may have either a ".tsv" or a ".txt" extension. Terra will accept either one. 

4. Upload the new TSV file to your workspace

Click the blue "+" icon at the top right of the TABLE column in the Data tab of the workspace and follow the directions.
Data-QuickStart-Part2_Upload-tsv.png

iGO_tip-icon.png


A note about overwriting table rows

  When your TSv load file has the same entity (name) as a table already in the workspace, you may get an error message when you try to upload about overwriting data (see screenshot).

Exercise2-tsv-warning_Screen_Shot.png

Note that Terra will only overwrite data rows with the same ID (in the first column). If the TSV (load) file includes different IDs, these rows will be added to the existing table.


How to make a data table from scratch or a template (in a spreadsheet editor)

Workspace tables include rows and columns for organizing data, much like spreadsheet built into the data page. So it's no surprise that you can use a spreadsheet editor to create and upload a file - a "load file" - to generate a new table in your workspace.

  • Each row is a unique entity (participant, sample, etc.)
  • Each column is a distinct attribute corresponding to that entity - (ie. the participant's sex, age, or height; the sample BAM, FASTA, etc.). 

General formatting requirements
A load file has to be in "tab-separated values" or "tab-delimited text" format (Terra recognizes both). 

1. Make a table.tsv "load file" in a spreadsheet editor

Click the table types below for templates, examples, and required formatting
Each collapsed section includes a description of the table type, formatting requirements and a screenshot of what the table will look like when uploaded to your workspace. 

Participant table template and formatting requirements

Download a template participant.tsv file here

What's in a participant table?
A participant table organizes participants in a study. The minimum participant table is only one column - the participant ID. This ID can be used in additional tables to associate samples, for example, with the right individuals.

Example: participant.tsv in spreadsheet

entity:participant_id
your-participant1-id
your-participant2-id
your-participant3-id
- Parts in red (i.e. "entity:participant_id") must be entered exactly as shown
- You'll use your own participant IDs (i.e. "your-particpant1-id")

Example: Participant table in Terra

template-tsv_participant-table_Screen_shot.png

Sample table template and formatting requirements

Download a template sample.tsv load file here

What's in a sample table?
A sample, or other entity, table, keeps track of data - historically input data for a workflow. The minimum sample table includes an ID column and column for a data file (could be FASTQ, BAM, CRAM, etc. - whatever form your data are).

Example: sample.tsv in spreadsheet

entity:sample_id BAM
participant1-blood gs://your-bucket-name/blood_sample_P1.bam
participant1-spit gs://your-bucket-name/spit_sample_P1.bam
participant2-blood gs://your-bucket-name/blood_sample_P2.bam
participant2-spit gs://your-bucket-name/spit_sample_P2.bam
- Parts in red (i.e. "entity:sample_id") must be entered exactly as shown!
- You'll use your own sample IDs (i.e. "your-participant1-blood") and the complete paths of the data files

Example: Sample table in Terra

template-sample-in-Terra_Screen_shot.png

Entity table (i.e. specimen table, unicorn table) template and formatting requirements

Download a template entity.tsv load file here

The Terra model is flexible and you can use whatever name ("entity") you wish for your table, as long as your spreadsheet follows the format entity:your-name_id.

Example: unicorn.tsv in spreadsheet

entity:unicorn_id VCF
Golden gs://magic-bucket/golden.vcf
Lightening gs://magic-bucket/lightening.vcf
- Parts in red (i.e. "entity:" and "_id") must be entered exactly as shown with no spaces
- You'll use your own entity IDs (i.e. "Golden" and "Lightening" in the example above), column headers (attribute type - i.e. "VCF" from above), and data  (such as complete file paths - "gs://" - to data in cloud storage). 

Example: Unicorn table in Terra
Modifying-data-tables_Unicorn-table_Screen_shot.png

Sets of data - membership.tsv - template and formatting requirements

Download a template membership.tsv here

What's in an entity_set table?
A set table defines entities or pairs grouped into sets for analysis. An entity_set table always refers to entities in an entity table (i.e. a set of pairs from the pair table, or a set of samples from the sample table). 

When will you use set membership tables? 

Analyzing subsets of single entities in a larger dataset

You may have many samples in a data table, but want to do some testing with just three samples. You want to use the same three samples every time, and they're sprinkled throughout the sample table. To avoid having to manually select the exact samples you want every time you run a workflow, you can define subsets of the data in set table.

Data-Quickstart_Part3_Sets-of-single-entities-as-input.png

To learn how to generate a set automatically by running a workflow on a subset of data, see the Data Tables QuickStart Part 3, or watch this video:

Using workflows that run on arrays

Set tables can also be used as inputs for workflows that accept arrays as inputs

Data-Quickstart_Part4_Sets-as-input-arrays.png

G0_tip-icon.png


Examples of workflows that take arrays as input

  The Optimus pipeline takes in multiple lanes of a sample, but outputs one file that corresponds to the sample, not the individual read lanes.

Another example, familiar to cancer researchers, is the CNV_Somatic_Panel_workflow, which generates a single Panel of Normals (PoN) from a list of normal samples. The PoN is used in variant calling to filter out systemic errors that occur when reads are processed.

For hands-on practice using tables to run array data, see the Data Tables QuickStart Part 3 and Part 4.

Example: sample_set.tsv in spreadsheet
The first column is the unique unique ID for each set and the second column is the entity_id (in a different - i.e. entity - table). There is a row for every member of a set.

membership:sample_set_id sample
spit participant1-spit
spit participant2-spit
blood participant1-blood
blood participant2-blood

- The parts in red ("membership:" and "set_id") must be typed exactly as shown with no spaces
- Customize with your own values 
- You must have a corresponding entity table (i.e. "sample" inthe example above) in the workspace that contains links to the input data files from the samples in the set. 

Example: sample_set table in Terra
To find the samples in each set, click on the link in the "samples" column (see highlighted box for the two samples in the "blood" set):
template-sample_set-table-in-Terra_Screen_shot.png

Pair table

Download a template pair.tsv here

What's in a pair table?
Pairs tables are a specific type of data table used in cancer research, where somatic workflows input samples corresponding to both tumor and normal tissue. Note that the pairs table references the particpant_id and the sample_ids of the case and the control sample (from the sample table). 

Example: pair.tsv in spreadsheet

entity:pair_id case_sample control_sample participant
HCC1143-2020 SM-74P4M SM-74NEG HCC1143

- The header entries in red (i.e. "entity:pair_id", "case_sample", "control_sample" and "participant" - shown above) must be typed exactly as shown
- Customize with your own pair, sample, and participant IDs 
- Remember the "sample" and "participant" IDs are from the "sample" and "participant" tables!

Example: Pair table in Terra

Modifying-tables_Pairs_Pairs-table-in-Terra.png

Workspace-level resources - the Workspace Data table - template and formatting

Download a template Workspace Data table here

What's in the Workspace Data table?
The workspace-wide resource data table (Workspace Data) holds variables you might want to use in multiple workflow analyses - like the genomic reference sequence file, or a Docker container.

Workspace resources can be links - to files in a Google bucket or Docker images, for example, or integers or strings for other parameters used in a workflows. Using the Workspace Data table lets you configure all at once and point to the resources from within the UI whenever you need them. Not only will you not need to look up reference file paths again, but if you update the resource files, you only need to update in one place.

Example: workspace-data.tsv in spreadsheet

workspace:ref_fasta docker disk-size data-folder
gs://public-bucket/Homo_sapiens_assembly.38.fasta us.gcr.io/docker-no-22 100 my-folder-name
- The parts in red (i.e. "workspace:") must be typed exactly as shown
- Customize with your own values for the resource file key and full paths to the data files (no quotes!)
- workspace resources can be links (files), strings or integers
Note that Terra will reorganize the files in alphabetical order. 

Example: Workspace Data table in Terra
Here is what you'll see when you upload the spreadsheet above to your workspace.

The first column (circled on the left) is the "key" that identifies what the file is. The other (circled on the right) includes a link to the file in a Google bucket:
Data-QuickStart_Part2_Workspace-data_Screen_shot.png

 

Additional tables in the "standard genomic data model"

2. Save file in "tab delimited text" or "tab-separated value" (tsv) format

Your editor may give you a warning, but we assure you, it's fine! 
Data-QuickStart_Part2_Save-as-Tab-delimited-text.png 

What will be the name of the table in your workspace?
It's worth noting that Terra will completely ignore the name you give the file. It's the "root entity" (in the first column header) that determines the table name in the workspace. 

3. Upload to the Data tab in your workspace

3.1. Click on the "+" sign in the blue circle at the top of the left TABLES column(in the orange rectangle in the screenshot below)
S11b_Add_tsv_file.png

3.2. Drag or click to select your tsv file

3.3. Click the "upload" button 

Once you've uploaded a load file to your workspace, you should see your data right away. Click on the link (table name) to expand the table. A sample table with one sample (AH_spit) and one column of metadata (links to FASTQ files) would look like this:

Modifying-data-tables_Sample-table_Screen_shot.png

Uploading interdependent tables in a workspace 

G0_warning-icon.png


Upload nested data tables in order!

  If data tables reference entities in another table, the dependent table needs to be uploaded first. The order is as follows ("A > B" means entity type A must be uploaded before entity type B):

  - participants > samples
  - samples > pairs
  - participants > participant sets
  - samples > sample sets
  - pairs > pair sets
  - set membership > set entity

Then, to associate data tables, make sure to check the box

To link information in different tables, you must select “Create participant, sample, pair associations” at the prompt in the pop-up window that appears when you try to upload the files.Modifying-tables_Create-asssociations.png

Uploading interdependent tables example: Mutect2

Any tsv that references another tsv must be loaded after the primary reference. so the workflow is able to access data from the correct table. A good example is if you are running the workflow Mutect 2. The input table is a pairs table, but the fields in the pairs table references data stored in the sample table that's associated with participant IDs from a participant table.

Order to upload (Mutect 2)

  1. Participant tsv
  2. Sample tsv
  3. Pairs tsv

Modifying-tables_Pairs_Pairs-table-in-Terra.png

If you uploaded the tables out of order or without selecting that checkbox, the workflow will fail because it reads from the pair table but has no knowledge of the sample table from which the pair table is supposed to read.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.