Need to modify, delete or generate a new workspace data table? Read on to learn how!
Setting the Stage - What does your data look like?
How to modify table entries or delete rows (in the Terra UI)
How to add data and metadata to a table (in a spreadsheet editor)
How to make a sample data table from scratch
How to make a set (array) of data files in a table
How to make a table of tumor-normal pairs of data (for somatic analysis)
A warning about order when uploading tsv files
Setting the stage - What does your data look like?
Before you set up or modify your workspace tables, consider what data you have and how you'll use that data for analysis. This will help determine what data tables you want and what they should look like.
- What's the most fundamental piece of data you will be analyzing? This will be the "root entity type" of the table - also the name of your table. You may categorize your data as specimens, samples, lanes of reads, and so on. To learn more about data table types in Terra, see this article.
- Sets of entities versus entities Will you be analyzing single entities? Groups of single entities? Tumor-normal pairs of samples? Does your workflow take arrays of single files as input? Answers to these questions impact the formatting requirements for the data table.
How to modify or delete data in a table (directly in Terra)
If you only need to change a handful of cells for data already in a workspace table, you can do so directly in the Terra interface, starting in the Data tab. For example, if you wanted to change the sample IDs, participant, or links to genomic data - or delete particular samples - in the table below:
To edit data in individual table cells
To delete data in a workspace table
2) Click the three vertical dots at the top and select "Delete Data"
To copy data in a table to another workspace
2) Click the three vertical dots at the top and select "Export to Workspace" to select the workspace to copy to
How to add rows and delete or add metadata columns in a data table
What if you need to (1) add data - like additional specimens to a specimen table? Or (2) add or delete metadata (add an additional column with links to a BAM file, for example) in an existing workspace data table? Or don't want to manually type into the interface?
To do any of these, you will need to modify the table in a spreadsheet editor outside of Terra in a special file called a "load file". A load file is a spreadsheet that you save in "tab-separated values" or "tab-delimited text" format. Expand the sections for more detailed steps and screenshots.
1. Download the existing table in tab-separated values (tsv) format
To download all rows
Select the Download all Rows button at the top of the table
To download particular rows
1. Select the rows to download
2. Click the three vertical dots to the right of the Download all Rows button and
select "Download as tsv"
2. Edit and save the tsv file using your favorite spreadsheet editor
- Open in your favorite spreadsheet editor to edit. If you are adding columns of metadata, column headers will be the attribute name of the new column in the updated data table.
- Save the file as "tab-separated values" or "tab-delimited text" Your editor may give you a warning, but we assure you, it's fine! Also, Terra will completely ignore the name you give the file. It's the root entity in the first column header that determines the table name in the workspace. a=A note about "tsv" versus "txt" file extensions Depending on what spreadsheet editor you use, when you save in the proper format, your spreadsheet file may have either a '.tsv or' '.txt' extension. Terra will accept either of these file extensions.
3. Upload the tsv file to your workspace
Click the blue "+" icon at the top right of the table column in the TABLE page of the workspace and follow the directions.
|A note about overwriting table rows|
|When your tsv load file has the same entity (name) as a table already in the workspace, you may get an error message when you try to upload about overwriting data (see screenshot). Note that Terra will only overwrite data rows with the same ID (in the first column). If the tsv (load) file includes different IDs, the data rows will be added to the existing table.|
How to make a sample data table from scratch (in a spreadsheet editor)
Workspace tables are like spreadsheets (columns and rows) built into the data page. So it's no surprise that you can use a spreadsheet editor to create a file - a "load file" - to upload as a new table. A load file is a spreadsheet that you save in "tab-separated values" or "tab-delimited text" format (Terra recognizes both). Each row is a unique entity (participant, sample, etc.) and each column is a distinct attribute corresponding to that entity - (ie. the participant's sex, age, or height; the sample bam, fasta, etc.).
|Table (load file) minimum requirements|
|The file needs to have a header containing column names The first column must be the unique identifier for each row The name of this first column header is preceded by entity: and must end with _id You must have at least one row of data The spreadsheet will look (approximately) like thi: When uploaded to Terra, the table will look like this:|
Start by opening a blank file in your favorite spreadsheet editor.
Fill in the first row (data table headers)
Whatever names you give column headers will be what is in the workspace data table.
|Terra requires a particular format for the first (ID) column|
|entity:your-entity-name_id The parts in red (entity: and _id) must be typed in exactly as shown. The entity: is the kind of data in the table. It's what a workflow will use as input. You can name the entity whatever helps you organize your data. For example, the first column header of a table of samples would read entity:sample_id and the first column of a table of unicorns would read entity:unicorn_id.|
For a minimum data table, the second column is generally the input data you'll use for a workflow. If it's FASTQ files (as above), you could use fastq for the second column header. You can use any header name that helps you keep the data organized.
In your spreadsheet editor it will look like this (you will likely have different values for sample and fastq):
Fill in the data (additional rows of the spreadsheet)
ID (first column)
You can use any name you want for the ID. Note that you cannot have two rows with the same entity_id. You may use the sample_id to connect data in separate tables (see below).
Additional columns include data and metadata for each entity. For example, in the example sample table, you'd add the link to the FASTQ file in a Google bucket in the
|Attributes that reference data files must include the full path name|
When you're done, your spreadsheet should look something like this
Save file in "tab delimited text" or "tab-separated value" (tsv) format
Upload to the workspace Data page
- Click on the "+" sign in the blue circle at the top of the left TABLES column (in the orange rectangle in the screenshot below)
- Select the "upload" option
Once you've uploaded a load file to your workspace, you should see your data right away in the Data page. If you click on the entity link to expand the table, the example above would look like this:
Another example - Adding additional data tables (unicorn data)
When you upload it to the same workspace, you would see a unicorn data table with two unicorns, in addition to the original sample table:
Sets (arrays) of samples: sample_set table
If you had a table of specimens, such as below, Terra allows you to group specific specimens for workflow analysis.
Example 1: Using entity_set tables to analyze subsets of data
If you choose a subset of input data and run a workflow on it, Terra will automatically generate a set of the entities you ran on. To learn how to generate a set automatically by running a workflow, see the Data Tables QuickStart Part 3.
Example 2: Using tables for workflows that run on arrays
- The Optimus pipeline takes in multiple lanes of a sample, but outputs one file that corresponds to the sample, not the individual read lanes.
- Another example, familiar to cancer researchers, is the CNV_Somatic_Panel_workflow, which generates a single Panel of Normals (PoN) from a list of normal samples. The PoN is used in variant calling to filter out systemic errors that occur when reads are processed.
Generating set tables (in a spreadsheet editor)
In a sets table spreadsheet, the first column is the unique set ID and the second column is the entity_id (in a different table). There is a row for every member of a set.
|First column (set ID) header formatting requirements membership:your-entity-name_set_id|
|Second column header formatting requirements "your-entity-name" comes from the table that stores the data pieces in the group - such as the sample table.|
The parts in red (
set_id) must be typed in exactly as shown, and
your-enrity-name must match name of the original data table. It's the table that contains links to the data files that a workflow will ultimately use as input.
Notice in the example below that the same specimen (neurons2k_lane2) is in both mouse-set1 and mouse-set2:
How to create a table of tumor/normal pairs of data (somatic analyses)
Another pre-defined data table is a pairs table - a set of paired tumor and normal samples taken from the same patient. This entity type is tailored for somatic workflows and may be double nested in groups. For example, a cancer study may include multiple pairs of samples from multiple study participants.
How to make tables of tumor-normal sample pairs
Pairs table in Terra
Tables of pairs are also like spreadsheets, and the process to create them from scratch is much the same as for single entity data tables. You'll work in a spreadsheet editor, save as "tab-delimited text" and upload the file as outlined above.
Pairs table tsv file (in a spreadsheet)
Note that the pairs table references the particpant_id and the sample_ids of the case and the control sample (from the sample table).
|First column (set ID) header formatting requirements entity:pair_id|
|Second column header formatting requirements case_sample|
|Third column header formatting requirements control_sample|
|Fourth column header formatting requirements participant|
The parts in red must be typed in exactly as shown, and the
control_sample IDs must match names in the sample data table. It's the table that contains links to the data files that a workflow will ultimately use as input.
Since the pairs table references both the participant_id and the sample_id, you will need to create and upload both participant and sample data tables before the pairs table.
Uploading tables in a workspace
|Upload (nested) tables in a particular order!|
|If data tables reference entities in another table, the dependent table needs to be uploaded first. The order is as follows ("A > B" means entity type A must be uploaded before entity type B): participants > samples samples > pairs participants > participant sets samples > sample sets pairs > pair sets set membership > set entity (e.g. participants before samples before sample set membership before sample set entity)|
|To associate data tables - check this box|
|To link information in different tables, you must select “Create participant, sample, pair associations” at the prompt in the pop-up window that appears when you try to upload the files.|
Uploading example: Mutect2
Order to upload
Any tsv that references another tsv must be loaded after the primary reference.
- Participant tsv
- Sample tsv
- Pairs tsv
For example, when you run Mutect2, if you uploaded the tables out of order or without selecting that checkbox, the workflow will fail because it reads from the pair table but has no knowledge of the sample table from which the pair table is supposed to read.