Adding data to a workspace with a template
FollowAdd study data in a table to your workspace in four steps with a table templates. Templates, screenshots and formatting requirements are provided for different use-cases.
1. Download the sample template (tsv format)
2. Edit in a spreadsheet editor
3. Save as "tab-delimited text" or "tab-separated values"
4. Upload to your Terra workspace
Templates for other table types
- Participant table
- Flexible entity tables (beyond samples)
- Pair tables
- Set tables
- Workspace-wide resources - the Workspace Data table
Make sure to associate interconnected tables!
1. Download the sample template table
Sample table template and formatting
Download a template sample table here
What is it?
A sample, or other entity, table, keeps track of data - historically data used as input for a workflow. The minimum sample table includes an ID column and column for a data file (could be FASTQ, BAM, CRAM, etc. - whatever form your data are).
Example: Sample table in spreadsheet
entity:sample_id | BAM |
your-participant1-blood-ID | gs://your-bucket-name/blood_sample_P1.bam |
your-participant1-spit-ID | gs://your-bucket-name/spit_sample_P1.bam |
your-participant2-blood-ID | gs://your-bucket-name/blood_sample_P2.bam |
your-participant2-spit-ID | gs://your-bucket-name/spit_sample_P2.bam |
- Parts in red ("entity:" and "-id") must be entered exactly as shown!
- You'll use your own values for the sample IDs and data file paths
Example: Sample table in Terra
2. Edit the sample.tsv file using your favorite spreadsheet editor
Open in your favorite spreadsheet editor to edit. Cells can only include alphanumeric characters, "-" and "_" - no spaces are allowed. If you are adding columns of metadata, column headers will be the attribute name of the new column in the updated data table.
3. Save file in "tab-delimited text" or "tab-separated value" format
Your editor may give you a warning, but we assure you, it's fine! Also, Terra will completely ignore the name you give the file. It's the root entity in the first column header that determines the table name in the workspace.
|
|
---|---|
Depending on what spreadsheet editor you use, when you save in the proper format your spreadsheet may have either a ".tsv" or a ".txt" extension. Terra will accept either one. |
4. Upload the tsv file to your workspace
Click the blue "+" icon at the top right of the table column in the TABLE page of the workspace and follow the directions to drag or select your template file.
|
|
---|---|
When your tsv load file has the same entity (name) as a table already in the workspace, you may get an error message when you try to upload about overwriting data (see screenshot).
Note that Terra will only overwrite data rows with the same ID (in the first column). If the tsv (load) file includes different IDs, the data rows will be added to the existing table. |
Templates for additional table types
Click the table below for more information, formatting requirements, and example screenshots.
Participant table
Download a template participant.tsv file here
What is it?
A participant table organizes participants in a study. The minimum participant table is only one column - the participant ID. This ID can be used in additional tables to associate samples, for example, with the right individuals.
Example: Participant table in spreadsheet
entity:participant_id |
your-participant1-id |
your-participant2-id |
your-participant3-id |
- You'll use your own participant IDs (note that these must be unique)
Example: Participant table in Terra
Entity table (i.e. specimen table, unicorn table)
Download a template single-entity.tsv here
The Terra model is flexible and you can use whatever name you wish for your table, as long as it follows the format entity:your-name_id.
Example: "Unicorn" table in spreadsheet
entity:unicorn_id | VCF |
Golden | gs://magic-bucket/golden.vcf> |
Lightening | gs://magic-bucket/lightening.vcf |
- You'll use your own values for the entity name, the entity IDs, the data column header and full paths to data files.
Example: "Unicorn" table in Terra
Pair table
Download a template pair tsv here
What is it?
Pair tables are a specific type of data table used in cancer research, where somatic workflows typically input samples corresponding to both tumor and normal tissue. Note that the pair table references the particpant_id (in the participant table) and the sample_id of the case and the control sample (from the sample table).
Example: Pair tsv in spreadsheet
entity:pair_id | case_sample | control_sample | participant |
HCC1143-2020 | SM-74P4M | SM-74NEG | HCC1143 |
- The header entries (in red) must be typed exactly as shown
- Customize with your own values for the case sample ID, control sample ID, and participant ID
- The case- and control-sample IDs are from the sample.tsv
- The participant ID is from the participant.tsv table
- Upload in this order: 1) participant.tsv 2) sample.tsv 3) pairs.tsv
Example: Pair table in Terra
Sets of data - sample_set table
Download a template set membership tsv here
What is it?
A set table defines entities grouped into sets for analysis. An entity_set table always refers to entities in an entity table.
Example: Set membership table in spreadsheet
The first column is the unique unique ID for each set and the second column is the entity_id (in a different - i.e. entity - table). There is a row for every member of a set.
membership:sample_set_id | sample |
spit | participant1-spit-sample-id |
spit | participant2-spit-sample-id |
blood | participant1-blood-sample-id |
blood | participant2-blood-sample-id |
- Your entity doesn't need to be samples, but the entity in the headers must match
- There must be an entity table (i.e. "sample") in the workspace already
- The entity table contains links to the input data files
- Customize with your own values for the entity, set IDs and sample IDs (i.e. replace example sets of "spit" tissue and "blood" tissues and "participant1-spit-sample-id")
Example: Set membership table in Terra
To find the samples in each set, click on the link in the "samples" column (see highlighted box for the two samples in the "blood" set):
Workspace-level resources - the Workspace Data table
Download a template Workspace Data table here
What is it?
The workspace-wide resource data table (Workspace Data) holds variables you might want to use in multiple workflow analyses - like the genomic reference sequence file, or a Docker container. Using the Workspace Data table lets you configure all at once and point to the resources from within the UI whenever you need them. Not only will you not need to look up reference file paths again, but if you update the resource files, you only need to update in one place.
Example: Workspace Data tsv in spreadsheet
workspace:ref_fasta | ref_fasta_index | ref_dict |
gs://public-bucket/Homo_sapiens_assembly.38.fasta | gs://public-bucket/Homo_sapiens_assembly.38.fai | gs://public-bucket/Homo_sapiens_assembly.38.dict |
- Customize the resource file key (header) and full path (second row)
- Note that Terra will reorganize the files in alphabetical order
Example: Workspace Data table in Terra
The screenshot below is what you'll see when you upload the spreadsheet above to a Workspace Data table. The first column (circled column on the left) identifies what the file is. The other (circled on the right) includes a link to the file in a Google bucket:
|
|
---|---|
If data tables reference entities in another table, the dependent table needs to be uploaded first. The order is as follows ("A > B" means entity type A must be uploaded before entity type B):
1. participant.tsv Then, to associate data tables, make sure to check the boxTo link information in different tables, you must select “Create participant, sample, pair associations” at the prompt in the pop-up window that appears when you try to upload the files. |
Uploading example: Mutect2
Any tsv that references another tsv must be loaded after the first reference. For example, the pairs table below references samples (SM-74P4M and SM-74NEG) and particpants (HCC1143).
The tables must be uploaded in the order below:
- Participant tsv
- Sample tsv
- Pairs tsv
For example, when you run Mutect2, if you uploaded the pairs table before the sample tables (or without selecting that checkbox), the workflow will fail because it reads the sample name from the pair table but has no knowledge of the samples table from where the data files are stored.
Comments
0 comments
Please sign in to leave a comment.