Now that you've used a data table for workflow inputs and outputs, hopefully you can see how using them helps manage your project data-in-the-cloud. You're excited to use tables, but how how do you add a one to your workspace? We'll explore that in this section.
Overview: How to add tables to a Terra workspace
In Part 1, you worked with an existing table. Suppose your workspace doesn't include any data tables? Even if you upload your own data to the workspace bucket, you will want to add a table to help organize the data and link to workflows for analysis. There are several different ways to add a data table to your workspace. See the options below. Then read on for step-by-step instructions to generate a table load file from scratch.
Import from another workspace
You can copy data in a table from another workspace to your own workspace by selecting the rows of data you want, clicking on the three vertical dots at the top right, and choosing "Export to workspace".
Import from Gen3, the Data Library, or other external servers
For external data resources directly connected to Terra, you'll be able to browse, select the data subset you want, and export to your workspace. Note that when you export data from the Data Library or an external source such as the Gen3 platform, they will usually show up as multiple tables of predefined entities.
Add a table by making and uploading a "load file"Maybe there's no workspace with data in a table to copy, or you want to include a table for data you've just uploaded to your workspace bucket. You can create a table from scratch by generating a "load file" in a spreadsheet editor (outside of Terra) and uploading it by clicking on the blue + icon at the top of the Data page.
Learn how to create a TSV file from a template (you can find template TSV files here)
Read on for instructions on how to create a TSV/load file to add a new workspace table.
2.1. Create a data table load file (TSV) from scratch
Workspace tables are like spreadsheets (columns and rows) built into the data page. So it's no surprise that you can use a spreadsheet editor to create a tsv/load file to upload as a new table. Each row corresponds to a unique entity and each column is a distinct attribute - ie. sex, age, height, bam, fasta, etc. and each row is a unique entity.
Minimum data table requirements A workspace table must have at least two columns (an ID column and one attribute column) and two rows (the header and at least one entity).
We 'll use the same workflow from Part 1, so the two columns in your most basic table will be the ID column and the input column (a FASTQ file).
Start by opening a blank file in your favorite spreadsheet editor
Step 1: Fill in the header
Each column in a table is a different kind of data or metadata. The load file header row specifies the workspace table column headers.
1.1. Fill in the ID (first column)
In part 1, we used a "specimen" table. However, we aren't limited to analyzing specimens, and in Terra, tables can be called anything that makes the most sense for your project. So in this part, we will call the entity we're studying "samples" instead. Use that in the first column header.
Terra requires a particular format for the ID column header
The parts in red (
_id) must be typed in exactly as shown. You can name the entity whatever helps you organize your data, however. For example, the first column header of a table of samples would read
entity:sample_id and the first column of a table of unicorns would read
1.2. Fill in the input file type (second column)
We know that the workflow is looking for a FASTQ file, so we will use the variable
fastq for the second column header.
In your spreadsheet editor it will look (approximately) like this:
Step 2: Fill in the sample data (second row of the spreadsheet)
The rest of the table is the "data" corresponding to the headers. There is one row for each individual entity (sample, in this case) in your table. The simplest table includes one entity, but Terra tables can include an almost unlimited number of rows, each one its own entity.
2.1. Fill in the sample ID (first column).
You can use any name you want for the sample ID. In a real analysis, this would be the unique ID of the sample.
2.2. Fill in the full path to the input data file (second column).
This is the space where you will include the link to the input data file in the cloud. For the quickstart, you can use this downsampled FASTQ file (copy and paste the full path and file name) in a public bucket.
Check your data upload file format! When you're done filling in all four cells, your spreadsheet should look like this:
Step 3: Save file in "tab delimited text" format
Your editor may give you a warning, but we assure you, it's fine!
Note that Terra will completely ignore the name you give the file. It's the root entity in the first column header (
entity:your-table-name_id)that determines the table name in the workspace.
2.2. Create a workspace-level resources table from scratch
The workspace resource data table (aka Workspace Data table) holds variables you might want to use in multiple workflow analyses - like the genomic reference sequence file, or a Docker container. Using the workspace data table lets you configure the data once and point to it whenever you need it. Not only will you not need to look up the file path again, but if you update the file, you only need to update in one place.
The workspace resource data table in the Data-Tables-QuickStart looks like this:
The first column (circled column on the left) identifies what the file is. The other (circled on the right) includes a link to the file in a Google bucket.
To copy a workspace resources data table from another workspace, you can download an existing table by clicking on the "Download TSV" link (top right) to your local machine. Then upload to a different workspace by clicking on the blue "+" icon by the TABLES column.
To create a workspace resources table, you can create a TSV file using a spreadsheet editor just like for a regular data table (above). The spreadsheet looks like this:
Workspace resources data table formatting requirements The first row is the "Key" row. In it you will put the name of the reference (such as
ref_fasta_index - for the reference FASTA index file). Note that you can only use lower-case letters, dashes, and underscores (no spaces!!).
The first column header must have the format below. Parts in red must be typed in exactly.
The second row includes links to each resource files in an accessible Google bucket.
You can include additional information such as workspace tags. You can see all the
workspace tags in the right column of the Dashboard:
Save your workspace resources table as a "tab delimited text" or "Tab separated values" as described above. You can use the blue "+" icon to upload to your workspace.
2.3. Upload your TSV file to the workspace to create a new table
Click the blue "+" icon at the top right of the table column in the TABLE page of the workspace and follow the directions to upload both the samples and workspace resources tables.
2.4. Run the workflow on data from the new table
Step 1: Select the input file
Notice there's now an additional table in your workspace (expand by clicking the name to check yours!):
The four fields in the new table should look very familiar!
Step 2: Set up and run workflow
To run the workflow with the new data as input, select the data box in the new table and run
1_Single-input-workflow, just like in Part 1. You will need to set up the inputs and outputs in the workflow configuration form.
Make sure your workflow inputs match your table! The attribute in the workflow setup (configuration) form needs to match the headings in your table exactly, or the workflow will fail. Note that when a workflow fails because it cannot find the input files, it does so immediately (it's always a great thing to check, when your submission fails before it even starts!).
- The root entity type should be "sample"
- The attribute field that corresponds to the variable
R1_fastqshould be the column header you used for the fastq (input) file
- The sample_id attribute should be "this.sample_id"
Then save and launch as before.
You can monitor your submission in the Job History page.
When your workflow is complete, expand the sample data table to see where the generated data are!
Thought questions about using load files
Imagine you upload a TSV file with an entity that already exists in the workspace. For example, the load file below.
Uploading a new TSV file with an entity that already exists will not generate any new tables, but will add rows to an existing table.
For example, the file above will generate one additional row in the specimen table - corresponding to the new specimen - assuming you give it a unique ID. Note that if the file includes an ID already in the existing table, Terra will overwrite the existing row when you load the new TSV.
Notice that Terra also generates a new table column to the table if you used a different name for the FASTQ attribute:
A note about overwriting table rows When your TSV file has the same entity (name) as a table already in the workspace, you may get an error message when you try to upload about overwriting data (see screenshot).
Note that Terra will only overwrite data rows with the same ID. You can ignore this warning if the TSV file only contains new entities (i.e. different sample_IDs). If the load file includes different IDs, the rows will be added to the existing table.
2.5. Additional practice with tables
Try making your own data tables! Don't worry, they're easy to delete by selecting all the rows in the table and clicking on the three vertical dots.
What happens to data files if you delete a table?Note that if the tables include metadata (i.e. reference the URI of data files in cloud storage), deleting a data table (whether original or copied from another workspace) will not delete the primary data files.
|Congratulations! You've completed Part 2 of the Data Tables Quickstart!|