Workflow setup: Writing outputs to the data table

Allie Cliffe
  • Updated

This article explains how to write output file metadata to the input data table.

Workflow setup: Outputs overview

Generated data are stored in the Workspace bucket by default. You can designate if you want the workflow to write links to the output data files to the data table. Writing to the data table helps associate generated data with input data.

When using the data table, you can choose what you want to do with the workflow outputs. Generated files are stored in the workspace bucket by default, but you can have the workflow write links to the output files right in the data table. You'll specify under what column name the outputs will be added to the data table in the configuration form. If the column doesn't already exist, Terra will create it when the generated data is written to the workspace bucket. 

How to write output file metadata to the input data table

4.1 Go to the Outputs tab.

4.2. For each output variable, click into the attribute field.

You'll see a dropdown that lists all columns that exist in the root entity data table. 

4.3. Choose an existing column or type in a new name to add a new column of data to the table. 

Configure-workflow_Write-outputs-to-data-table_Screen_shot.png 

Why write outputs to the data table? Writing to the data table associates generated output with the input data file (the output files are written alongside the input files in the table), and helps organize your outputs in a way that is meaningful to you. It also makes it easy to use the data for downstream analysis. 

Outputs in Google bucket (file folder is random string)
The default folders for generated data are named by the submission ID, a non-human readable string of numbers and letters. This assures that you will not overwrite generated data (because the directory names are unique), but makes finding the files challenging.
Managing-data-with-tables_Generated-data-in-bucket_Screen_shot.png

Outputs in the data table (clear associations)
Here's the same output file in the data table. Running the workflow generated the aligner_output_crai and aligner_output_cram columns. Note that the unique collaborator_sample_id references the entire row, associating the generated data with the primary data.
Managing-data-with-tables_Generated-data_Screen_shot.png

Where are generated output files stored?

Regardless of whether you use the data table or file paths for input, analysis outputs are stored in the workspace Google bucket by default.

When you might not want to write outputs to the tableAlthough we recommend using data tables, there are situations where you may not want to: if you cannot fit your data into the data table in a way that makes sense for your analysis; or if you want to test a new method in Terra quickly - with as little set up as possible.

Be careful not to overwrite in the data table If you use the same output name for multiple runs, Terra will overwrite the links in the data table with the most recent output data link. Note that data from a previous run will still exist in the workspace bucket, but it will be harder to find. 

To be able to compare results from different configurations, you'll want to give your outputs a name that indicates which is which.
configure-workflows_Multiple-test-outputs_Screen_shot.png

How to verify workflow output files

If your output attributes have the format "this.your_filename", the workflow will write output metadata to the "your_filename" column of the data table. You'll see the additional metadata for these output files in the data table after a successful run. 

For example, after completing Exercise 1 in the Workflows-QuickStart tutorial, you'll see the sample table now contains three new columns. Each column corresponds to a different output filetype: the BAM index, the BAM, and  a validation report. The cells include links to files in the workspace Google bucket for each sample. This data is now available for downstream use by other workflows in your workspace (see Exercise 3 of the Workflows Quickstart). 

Whether or not you write to the data table, you can find the output files in your workspace Google bucket by clicking on the "Files" icon in the left column of the Data tab.Data-Google-bucket-Files_Screen_Shot.png

Note about output file folder names in the workspace bucket Each time you Launch a workflow, Terra will assign a unique submission ID to that submission. This submission ID is also the name of the output folder in the workspace Google bucket. Outputs from multiple submissions of the same workflow in the same workspace will not be overwritten since they are in different submission ID folders.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.