Need to modify, delete or generate data in a workspace table? Read on to learn how!
Setting the stage - What kind of data do you have?
Before you set up or modify your workspace tables, consider what data you have and how you'll use that data for analysis. This will help determine what data tables you want and what they should look like.
- What's the most fundamental piece of data you will be analyzing?
This will be the name of your table in Terra and the "root entity type" of the table that includes input data files for a workflow. You may categorize your data as specimens, samples, lanes of reads, and so on. To learn more about predefined data table types (the default genomic data model) in Terra, see this article.
- Sets of entities versus entities
Will you be analyzing single entities? Groups of single entities? Tumor-normal pairs of samples? Does your workflow take arrays of single files as input? Answers to these questions can affect whether you will use predefined tables like set table or pairs tables, which have specific formatting requirements.
How to modify table entries or delete data rows (in the Terra UI)
This video shows how to change a handful of cells for data already in a workspace table:
You can do this directly in the Terra interface, starting in the Data tab. For example, if you wanted to change the sample or participant IDs, or links to genomic data - or delete particular samples - in the table below:
To edit data in individual table cells
To delete data (rows) in a workspace table
2. Click the three vertical dots at the top and select "Delete Data"
To copy data in a table to another workspace
2. Click the three vertical dots at the top and select "Export to Workspace" to select the workspace to copy to
How to add additional data (rows) or add/delete metadata (columns) in a table
If you have an existing table, but need to
1. Add rows of data - like additional specimens to a specimen table.
2. Add or delete columns of metadata - such as adding an additional column with links to a BAM file.
You will make these modifications in a spreadsheet editor outside of Terra. Expand the sections below for more step-by-step instructions and screenshots.
1. Download the existing table
1.1. Select the Download all Rows button at the top of the table
To download particular rows
1.1. Select the rows to download
1.2. Click the three vertical dots to the right of the "Copy Page to Clipboard" button and select "Download as TSV"
2. Delete the entire table in the workspace
2.2. Click the three vertical dots to the right of the "Copy Page to Clipboard" and select "Delete Data"
2.3. Confirm by clicking the blue "Delete" button
If there are many rows in your table, you will need to scroll down to the end of the form to see the blue button.
3. Edit and save the tsv file using your favorite spreadsheet editor
3.1. Open in your favorite spreadsheet editor and edit like any file. Note that cells can only include alphanumeric characters, "-" and "_" and no spaces. All formatting will be lost when you save the file in tsv format.
Note that if you are adding columns of metadata, column headers will be the attribute name of the new column in the updated data table.
3.2. Save the file as "tab-separated values" or "tab-delimited text"
Your editor may give you a warning, but we assure you, it's fine! Also, Terra will completely ignore the name you give the file. It's the name ("root entity") in the first column header that determines the table name in the workspace.
|Depending on what spreadsheet editor you use, when you save in the proper format your spreadsheet may have either a ".tsv" or a ".txt" extension. Terra will accept either one.|
4. Upload the new TSV file to your workspace
Click the blue "+" icon at the top right of the TABLE column in the Data tab of the workspace and follow the directions.
|When your TSv load file has the same entity (name) as a table already in the workspace, you may get an error message when you try to upload about overwriting data (see screenshot).
Note that Terra will only overwrite data rows with the same ID (in the first column). If the TSV (load) file includes different IDs, these rows will be added to the existing table.
How to make a data table from scratch or a template (in a spreadsheet editor)
Workspace tables include rows and columns for organizing data, much like spreadsheet built into the data page. So it's no surprise that you can use a spreadsheet editor to create and upload a file - a "load file" - to generate a new table in your workspace.
- Each row is a unique entity (participant, sample, etc.)
- Each column is a distinct attribute corresponding to that entity - (ie. the participant's sex, age, or height; the sample BAM, FASTA, etc.).
General formatting requirements
A load file has to be in "tab-separated values" or "tab-delimited text" format (Terra recognizes both).
1. Make a table.tsv "load file" in a spreadsheet editor
Click the table types below for templates, examples, and required formatting
Each collapsed section includes a description of the table type, formatting requirements and a screenshot of what the table will look like when uploaded to your workspace.
Participant table template and formatting requirements
Download a template participant.tsv file here
What's in a participant table?
A participant table organizes participants in a study. The minimum participant table is only one column - the participant ID. This ID can be used in additional tables to associate samples, for example, with the right individuals.
Example: participant.tsv in spreadsheet
- You'll use your own participant IDs (i.e. "your-particpant1-id")
Example: Participant table in Terra
Sample table template and formatting requirements
Download a template sample.tsv load file here
What's in a sample table?
A sample, or other entity, table, keeps track of data - historically input data for a workflow. The minimum sample table includes an ID column and column for a data file (could be FASTQ, BAM, CRAM, etc. - whatever form your data are).
Example: sample.tsv in spreadsheet
- You'll use your own sample IDs (i.e. "your-participant1-blood") and the complete paths of the data files
Example: Sample table in Terra
Entity table (i.e. specimen table, unicorn table) template and formatting requirements
Download a template entity.tsv load file here
The Terra model is flexible and you can use whatever name ("entity") you wish for your table, as long as your spreadsheet follows the format entity:your-name_id.
Example: unicorn.tsv in spreadsheet
- You'll use your own entity IDs (i.e. "Golden" and "Lightening" in the example above), column headers (attribute type - i.e. "VCF" from above), and data (such as complete file paths - "gs://" - to data in cloud storage).
Example: Unicorn table in Terra
Sets of data - membership.tsv - template and formatting requirements
Download a template membership.tsv here
What's in an entity_set table?
A set table defines entities or pairs grouped into sets for analysis. An entity_set table always refers to entities in an entity table (i.e. a set of pairs from the pair table, or a set of samples from the sample table).
When will you use set membership tables?
Analyzing subsets of single entities in a larger datasetYou may have many samples in a data table, but want to do some testing with just three samples. You want to use the same three samples every time, and they're sprinkled throughout the sample table. To avoid having to manually select the exact samples you want every time you run a workflow, you can define subsets of the data in set table.
To learn how to generate a set automatically by running a workflow on a subset of data, see the Data Tables QuickStart Part 3, or watch this video:
Using workflows that run on arraysSet tables can also be used as inputs for workflows that accept arrays as inputs
|The Optimus pipeline takes in multiple lanes of a sample, but outputs one file that corresponds to the sample, not the individual read lanes.
Another example, familiar to cancer researchers, is the CNV_Somatic_Panel_workflow, which generates a single Panel of Normals (PoN) from a list of normal samples. The PoN is used in variant calling to filter out systemic errors that occur when reads are processed.
Example: sample_set.tsv in spreadsheet
The first column is the unique unique ID for each set and the second column is the entity_id (in a different - i.e. entity - table). There is a row for every member of a set.
- The parts in red ("membership:" and "set_id") must be typed exactly as shown with no spaces
- Customize with your own values
- You must have a corresponding entity table (i.e. "sample" inthe example above) in the workspace that contains links to the input data files from the samples in the set.
Example: sample_set table in Terra
To find the samples in each set, click on the link in the "samples" column (see highlighted box for the two samples in the "blood" set):
Download a template pair.tsv here
What's in a pair table?
Pairs tables are a specific type of data table used in cancer research, where somatic workflows input samples corresponding to both tumor and normal tissue. Note that the pairs table references the particpant_id and the sample_ids of the case and the control sample (from the sample table).
Example: pair.tsv in spreadsheet
- The header entries in red (i.e. "entity:pair_id", "case_sample", "control_sample" and "participant" - shown above) must be typed exactly as shown
- Customize with your own pair, sample, and participant IDs
- Remember the "sample" and "participant" IDs are from the "sample" and "participant" tables!
Example: Pair table in Terra
Workspace-level resources - the Workspace Data table - template and formatting
Download a template Workspace Data table here
What's in the Workspace Data table?
The workspace-wide resource data table (Workspace Data) holds variables you might want to use in multiple workflow analyses - like the genomic reference sequence file, or a Docker container.
Workspace resources can be links - to files in a Google bucket or Docker images, for example, or integers or strings for other parameters used in a workflows. Using the Workspace Data table lets you configure all at once and point to the resources from within the UI whenever you need them. Not only will you not need to look up reference file paths again, but if you update the resource files, you only need to update in one place.
Example: workspace-data.tsv in spreadsheet
- Customize with your own values for the resource file key and full paths to the data files (no quotes!)
Example: Workspace Data table in Terra
Here is what you'll see when you upload the spreadsheet above to your workspace.
The first column (circled on the left) is the "key" that identifies what the file is. The other (circled on the right) includes a link to the file in a Google bucket:
Additional tables in the "standard genomic data model"
2. Save file in "tab delimited text" or "tab-separated value" (tsv) format
Your editor may give you a warning, but we assure you, it's fine!
What will be the name of the table in your workspace?
It's worth noting that Terra will completely ignore the name you give the file. It's the "root entity" (in the first column header) that determines the table name in the workspace.
3. Upload to the Data tab in your workspace
3.1. Click on the "+" sign in the blue circle at the top of the left TABLES column(in the orange rectangle in the screenshot below)
3.2. Drag or click to select your tsv file
3.3. Click the "upload" button
Once you've uploaded a load file to your workspace, you should see your data right away. Click on the link (table name) to expand the table. A sample table with one sample (AH_spit) and one column of metadata (links to FASTQ files) would look like this:
Uploading interdependent tables in a workspace
|If data tables reference entities in another table, the dependent table needs to be uploaded first. The order is as follows ("A > B" means entity type A must be uploaded before entity type B):
- participants > samples
Then, to associate data tables, make sure to check the box
To link information in different tables, you must select “Create participant, sample, pair associations” at the prompt in the pop-up window that appears when you try to upload the files.
Uploading interdependent tables example: Mutect2
Any tsv that references another tsv must be loaded after the primary reference. so the workflow is able to access data from the correct table. A good example is if you are running the workflow Mutect 2. The input table is a pairs table, but the fields in the pairs table references data stored in the sample table that's associated with participant IDs from a participant table.
Order to upload (Mutect 2)
- Participant tsv
- Sample tsv
- Pairs tsv
If you uploaded the tables out of order or without selecting that checkbox, the workflow will fail because it reads from the pair table but has no knowledge of the sample table from which the pair table is supposed to read.