Data Tables Quickstart Part 2 - Making a data table from scratch

Allie Hajian
  • Updated

Now that you've used a data table for workflow inputs and outputs, hopefully you can see how using them helps manage your project data-in-the-cloud. You're excited to use tables, but how how do you add a one to your workspace? We'll explore that in this section. 

Overview: How to add tables to a Terra workspace

In part 1, you worked with a table that was already in the workspace. Suppose your workspace doesn't include any data tables? Even if you upload your own data to the workspace bucket, you will want to add a table to help organize the data and link to workflows for analysis. 

Option 1: Add a table by importing from another workspace, the Data Library, or other external platform (Gen3)

Import from another workspace

You can copy data in a table from another workspace to your own workspace by selecting the rows of data you want, clicking on the three vertical dots at the top right, and choosing "Export to workspace":
Data-QuickStart-Part2_Export-table-to-another-workspace.png


Import data from Gen3, the Data Library, or other external sources

For external data resources directly connected to Terra, you'll be able to browse, select the data subset you want, and export to your workspace. Note that when you export data from the Data Library or an external source such as the Gen3 platform, they will usually show up as tables of pre-defined entities. 

Option 2: Add a table by making and uploading a "load file"

Maybe there's no workspace with data in a table to copy, or you want to include a table for data you've just uploaded to your workspace bucket. You can create a table from scratch by generating a "load file" in a spreadsheet editor (outside of Terra) and uploading it by clicking on the blue + icon at the top of the Data page: Data-QuickStart-upload-tsv_Screen_Shot.png

Terra requires a particular file format for load files - "tab-separated value" (TSV) or "tab-delimited text" (they're the same thing).

Learn how to create a TSV file from a template (you can find template TSV files here)

Read on for instructions on how to create a TSV/load file to add a new workspace table.

2.1. Create a data table load file (TSV) from scratch

Workspace tables are like spreadsheets (columns and rows) built into the data page. So it's no surprise that you can use a spreadsheet editor to create a tsv/load file to upload as a new table. Each row corresponds to a unique entity and each column is a distinct attribute - ie. sex, age, height, bam, fasta, etc. and each row is a unique entity. 

icon-warning2.png


Minimum data table requirements

  A workspace table must have at least two columns (an ID column and one attribute
column) and two rows (the header and at least one entity). 

We 'll use the same workflow from Part 1, so the two columns in your most basic table
will be the ID column and the input column (a FASTQ file).

G0_icon-tip.png


Entity types:
"specimens" versus "samples"

  In part 1, we used a "specimen" table. However, you aren't limited to analyzing
specimens. So in this part, we will call the entity we're studying "samples" instead. We'll
use that in the first column (i.e. ID) header. 


Start by opening a blank file in your favorite spreadsheet editor

Fill in the first (header) row

The load file header row defines the workspace table column headers (no surprise there!). In your spreadsheet editor it will look (approximately) like this:  

Data-QuickStart_Part2_Spreadsheet-first-row.png


ID (first column)
 
In part 1, we used a "specimen" table. However, we aren't limited to analyzing specimens. So in this part, we will call the entity we're studying "samples" instead. We'll use that in the first column header. 

icon-warning2.png


Terra requires a particular format for the ID column header

  entity:your-entity-name_id

The parts in red (entity: and _id) must be typed in exactly as shown. You can name the entity whatever helps you organize your data. For example, the first column header of a table of samples would read entity:sample_id and the first column of a table of unicorns would read entity:unicorn_id.

Input file (second column) 
We know that the input data for our workflow are FASTQ files, so we will use the variable fastq for the second column header. 

Fill in the data (second row of the spreadsheet)

ID (first column)
You can fill in any name you want for the sample ID.

Input file (second column)
This is the space where you will include the link to the input data file in the cloud. You can use this downsampled FASTQ file (copy and paste the full path and file name):

gs://terra-featured-workspaces/QuickStart/quickstart_reads_1.tastq

icon-warning2.png


Attributes that reference data files directly must include the full path:

  gs://bucket_name/file_name
G0_icon-tip.png


Check your data upload file format!

  When you're done filling in all four cells, your spreadsheet should look like this:
Data-QuickStart_Part2_Spreadsheet-complete.png

Save file in "tab delimited text" format

Your editor may give you a warning, but we assure you, it's fine! Also, Terra will completely ignore the name you give the file. It's the root entity in the first column header that determines the table name in the workspace. 
Data-QuickStart_Part2_Save-as-Tab-delimited-text.png 

2.2. Create a Workspace Data (workspace resources) table from scratch

G0_icon-tip.png


K
eeping track of workspace-level resources (reference files, Docker, etc.)

  The workspace resource data table holds variables you might want to use in multiple
workflow analyses - like the genomic reference sequence file, or a Docker container. Using
the workspace data table lets you configure the data all at once and point to it from
within the Ui whenever you need it. Not only will you not need to look up the file path
again, but if you update the file, you only need to update in one place. 

The workspace resource data table in the Data-Tables-QuickStart looks like this:

Data-QuickStart_Part2_Workspace-data_Screen_shot.png

The first column (circled column on the left) identifies what the file is. The other (circled on the right) includes a link to the file in a Google bucket.

To copy a workspace resources data table from another workspace, you can download an existing table by clicking on the "Download TSV" link (top right) and then upload by clicking on the blue "+" icon by teh TABLES column. 

Create a workspace resource table from scratch

To create a workspace resources table, you can create a  TSV file using a spreadsheet editor just like for a regular data table. The spreadsheet looks like this:

Data-QuickStart_Part2_Workspace-data-table-format_Screen_shot.png

icon-warning2.png


Formatting a workspace resources data table

  The first row is the "Key" row 

The first column header must have the format below. Parts in red must be typed in exactly. 

workspace:file-name

The second row includes links to the resource files in an accessible Google bucket. 

You can include additional information such as workspace tags. You can see all the
workspace tags in the right column of the Dashboard:
Data-QuickStart_Part2-Tags-in-dashboard_Screen_shot.png

 

Save your workspace resources table as a "tab delimited text" or "Tab separated values" as described above. You can use the blue "+" icon to upload to your workspace. 

2.3. Upload your TSV file to the workspace to create a new table

Click the blue "+" icon at the top right of the table column in the TABLE page of the workspace and follow the directions to upload both the samples and workspace resources tables.
Data-QuickStart-Part2_Upload-tsv.png

2.3. Run the workflow on data from the new table

Select the input file
Notice there's now an additional table in your workspace (expand by clicking the name to check yours!):
Data-QuickStart_Part2_New-table.png
The four fields in the new table should look very familiar!

Run workflow
To run the workflow with the new data as input, select the data box in the new table and run 1_Single-input-workflow, just like in Part 1. 

icon-warning2.png


Check the Inputs attributes in the configuration form carefully!

  They need to match the headings in your table exactly, or the workflow will fail. Note that
when a workflow fails because it cannot find the input files, it does so immediately (it's
always a great first place to look, when your submission fails before it even starts!).  

Configuration hints

Data-QuickStart_Part2_Configure-new-table-inputs.png

  1. The root entity type should be "sample"
  2. The r1_fasta attribute field should be the column header you used for the fasta input file 
  3. The sample_id attribute should be "this.sample_id" 

Then save and launch as before.

You can monitor your submission in the Job History page.

When your workflow is complete, expand the sample data table to see where the generated data are!

Thought questions about using load files

Imagine you upload a TSV file with an entity that already exists in the workspace. For example, the load file below. 

Data-QuickStart_Part2_Adding-specimen-Additional-question.png

How many tables will there be? What will they be called?

Answer
Uploading a new TSV file with an entity that already exists will not generate any new tables, but will add rows to an existing table.

For example, the file above will generate one additional row in the specimen table - corresponding to the new specimen - assuming you give it a unique ID. Note that if the file includes an ID already in the existing table, Terra will overwrite the existing row when you load the new TSV.

Notice that Terra also generates a new table column to the table if you used a different name for the FASTQ attribute:

Data-QuickStart-Part2_Add-additional-specimen.png

A note about overwriting table rows

When your TSV file has the same entity (name) as a table already in the workspace, you may get an error message when you try to upload about overwriting data (see screenshot).

Exercise2-tsv-warning_Screen_Shot.png

Note that Terra will only overwrite data rows with the same ID. you can ignore this warning if the TSV file only contains new entities (i.e. different sample_IDs). If the load file includes different IDs, the rows will be added to the existing table.

2.5. Additional practice with tables

Try making your own data tables! Don't worry, they're easy to delete by selecting all the rows in the table and clicking on the three vertical dots:
Data-QuickStart_Delete_tables.png

Congrats!
You've completed Part 2 of the Data Quickstart

 

Next up: Sets of single entities

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.