Adding data to a workspace with a template

Allie Hajian
  • Updated

Learn how to add study data to your workspace in four steps with a template tsv file. This article also includes templates, screenshots and formatting requirements for organizing and analyzing different types of study data. 

To learn more about data tables, entities, and the data model in Terra, see Understanding entity types and the standard genomic data model.


Step 1. Download the template "sample.tsv"

Download a template sample table here.

What is a sample table?

A sample, or other entity, table, keeps track of data - historically data used as input for a workflow. The minimum sample table includes an ID column and column for a data file (could be FASTQ, BAM, CRAM, etc. - whatever form your data are).

Sample table template and formatting

Sample table in spreadsheet 

entity:sample_id BAM
your-participant1-blood-ID gs://your-bucket-name/blood_sample_P1.bam
your-participant1-spit-ID gs://your-bucket-name/spit_sample_P1.bam
your-participant2-blood-ID gs://your-bucket-name/blood_sample_P2.bam
your-participant2-spit-ID gs://your-bucket-name/spit_sample_P2.bam
  • Parts in red ("entity:" and "-id") must be entered exactly as shown!
  • You'll use your own values for the sample IDs and data file paths
  • Note that if you use the data uploader, you only need to include the file names, not the full paths. See How to use the data uploader for step-by-step instructions. 

Sample table in Terra

Step 2. Edit in a spreadsheet editor

Open in your favorite spreadsheet editor to edit. Cells can only include alphanumeric characters, "-" and "_" - no spaces are allowed. If you are adding columns of metadata, column headers will be the attribute name of the new column in the updated data table. 

Step 3. Save as "tab-delimited text" or "tab-separated values"

Your editor may give you a warning, but we assure you, it's fine! Also, Terra will completely ignore the name you give the file. It's the root entity in the first column header that determines the table name in the workspace. 



A note about ".tsv" versus ".txt" file extensions Depending on what spreadsheet editor you use, when you save in the proper format your spreadsheet may have either a ".tsv" or a ".txt" extension. Terra will accept either one.

Step 4. Upload to your workspace 

4.1. Click the Import Data button at the top left of the workspace data page.

4.2. Select Upload TSV.

Overwriting table rows

When your tsv load file has the same entity (name) as a table already in the workspace, you may get an error message when you try to upload about overwriting data (see screenshot).


Terra will only overwrite data rows with the same ID (in the first column). If the tsv (load) file includes different IDs, the data rows will be added to the existing table.

Templates for additional types (including the standard genomic data model)

Click the table below for more information, formatting requirements, and example screenshots. 

Participant table

Download a template participant.tsv file here

What is a participant table?

A participant table organizes participants in a study. The minimum participant table is only one column - the participant ID. This ID can be used in additional tables to associate samples, for example, with the right individuals.

Participant table in spreadsheet 

- Parts in red ("entity:participant_id" must be entered exactly as shown
- You'll use your own participant IDs (note that these must be unique)

Participant table in Terra

Entity table (i.e. specimen table, unicorn table)

Download a template single-entity.tsv here

The Terra model is flexible and you can use whatever name you wish for your table, as long as it follows the format entity:your-name_id.

Unicorn table in spreadsheet 

entity:unicorn_id VCF
Golden gs://magic-bucket/golden.vcf>
Lightening gs://magic-bucket/lightening.vcf
- Parts in red ("entity" and "_id") must be entered exactly as shown
- You'll use your own values for the entity name, the entity IDs, the data column header and full paths to data files.

Unicorn table in Terra

Pair table

Download a template pair TSV here

What is a pair table?
Pair tables are a specific type of data table used in cancer research, where somatic workflows typically input samples corresponding to both tumor and normal tissue. Note that the pair table references the particpant_id (in the participant table) and the sample_id of the case and the control sample (from the sample table). 

Pair table in spreadsheet

entity:pair_id case_sample control_sample participant
HCC1143-2020 SM-74P4M SM-74NEG HCC1143

- The header entries (in red) must be typed exactly as shown
- Customize with your own values for the case sample ID, control sample ID, and participant ID
- The case- and control-sample IDs are from the sample.tsv
- The participant ID is from the participant.tsv table
- Upload in this order: 1) participant.tsv 2) sample.tsv 3) pairs.tsv

Pair table in Terra


Sets of data - sample_set table

Download a template set membership tsv here

What is it?

A set table defines entities grouped into sets for analysis. An entity_set table always refers to entities in an entity table. 

Set membership table in spreadsheet

The first column is the unique ID for each set and the second column is the entity_id (in a different - i.e. entity - table). There is a row for every member of a set.  

membership:sample_set_id sample
spit participant1-spit-sample-id
spit participant2-spit-sample-id
blood participant1-blood-sample-id
blood participant2-blood-sample-id
- The parts in red ("membership:" and "_set_id" must be typed exactly as shown
- Your entity doesn't need to be samples, but the entity in the headers must match
- There must be an entity table (i.e. "sample") in the workspace already
- The entity table contains links to the input data files 
- Customize with your own values for the entity, set IDs and sample IDs (i.e. replace example sets of "spit" tissue and "blood" tissues and "participant1-spit-sample-id")

Set membership table in Terra

To find the samples in each set, click on the link in the "samples" column (see highlighted box for the two samples in the "blood" set):template-sample_set-table-in-Terra_Screen_shot.png

Workspace-level resources - the Workspace Data table

Download a template Workspace Data table here

What is it?
The Workspace Data table holds variables you might want to use in multiple workflow analyses - like the genomic reference sequence file, or a Docker container. Using the Workspace Data table lets you configure all at once and point to the resources from within the UI whenever you need them. Not only will you not need to look up reference file paths again, but if you update the resource files, you only need to update in one place.

Workspace Data tsv in spreadsheet

workspace:ref_fasta ref_fasta_index ref_dict
gs://public-bucket/Homo_sapiens_assembly.38.fasta gs://public-bucket/Homo_sapiens_assembly.38.fai gs://public-bucket/Homo_sapiens_assembly.38.dict
- Parts in red (i.e. "workspace:" must be typed exactly as shown
- Customize the resource file key (header) and full path (second row)
- Note that Terra will reorganize the files in alphabetical order 

Workspace Data table in Terra

The screenshot below is what you'll see when you upload the spreadsheet above to a Workspace Data table. The first column (circled column on the left) identifies what the file is. The other (circled on the right) includes a link to the file in a Google bucket. You can download or upload TSVs by clicking the link at the top (arrow).

How to associate data in different tables

It is often the case that associated data will be in different tables. For example, a person's genomic data might be in a sample table and their phenotypic data in a subject table. You will need to be careful about the order you upload the tsv file to make sure Terra keeps the associations. 

Step 1: Upload interdependent data tables in order!

If data tables reference entities in another table, the dependent table needs to be uploaded first.

If you are using the standard genomic data model, the order is as follows ("A > B" means entity type A must be uploaded before entity type B):

  1. participant.tsv
  2. sample.tsv (or entity.tsv)
  3. pair.tsv
  4. sample_set.tsv (or entity_set.tsv) 

You can also create sets of tumor:normal pair tables using the same formatting. These will be uploaded last.  

Step 2: To associate data tables, make sure to check the box

To link information in different tables, you must select “Create participant, sample, pair associations” at the prompt in the pop-up window that appears when you try to upload the files.Modifying-tables_Create-asssociations.png

Uploading Mutect2

An example of a workflow that requires inputs from participant, sample, pair, and sample_set tables is Mutect2. For a complete description, see the GATK Somatic SNVs and Indels workspace. The workspace includes a notebook tutorial of the workflow that guides users step by step through each tool in the workflow.

The workflow includes tables that references another table, which must be loaded after the first reference.

In the Mutect2 example, the pairs table below references samples (SM-74P4M and SM-74NEG) and particpants (HCC1143).Modifying-tables_Pairs_Pairs-table-in-Terra.png

The tables must be uploaded in the order below:

  1. Participant tsv
  2. Sample tsv
  3. Pairs tsv

If you uploaded the pairs table before the sample tables (or without selecting that checkbox), the workflow will fail because it reads the sample name from the pair table but has no knowledge of the samples table from where the data files are stored.

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request



Please sign in to leave a comment.