This article explains how to write output file metadata to the input data table.
- To learn how to automate some of this setup step by using a JSON file (especially useful if you anticipate using similar configurations many times), see Getting workflows up and running faster with a JSON file.
- To learn how to configure additional cost-saving options in Terra, see Workflow setup: Runtime options.
Workflow setup: Outputs overview
Generated data are stored in the Workspace bucket by default. You can designate if you want the workflow to write links to the output data files to the data table. Writing to the data table helps associate generated data with input data.
When using the data table, you can choose what you want to do with the workflow outputs. Generated files are stored in the workspace bucket by default, but you can have the workflow write links to the output files right in the data table. You specify under what column name the outputs are added to the data table in the configuration form. If the column doesn't already exist, Terra creates it when the generated data is written to the workspace bucket.
How to write output file metadata to the input data table
1. Go to the Outputs tab.
2. For each output variable, click into the attribute field.
You'll see a drop-down menu that lists all columns in the root entity data table.
3. Choose an existing column or type in a new name to add a new column of data to the table.
Workflow setup form
Why write outputs to the data table? Writing to the data table associates generated output with the input data file (the output files are written alongside the input files in the table), and helps organize your outputs in a way that is meaningful to you. It makes it easy to use the data for downstream analysis.
Outputs in Google bucket (file folder is random string)
The default folders for generated data are named by the submission ID, a nonhuman readable string of numbers and letters. This assures that you will not overwrite generated data (because the directory names are unique), but makes finding the files challenging.
Note the directory structure of the generated files: Files / c01b2b13-c2f5-4ea0-bc3b-319c963385ed / CramToBamFlow / 5bf8b92e-6ffa-4627-8265-6c5021d76677 / call-CramToBamTask
Outputs in the data table (clear associations)
Here's the same output file in the data table. Running the workflow generated the aligner_output_crai and aligner_output_cram columns. Note: The unique
collaborator_sample_id references the entire row, associating the generated data with the primary data.
Where are generated output files stored?
Whether you use the data table or file paths for input, analysis outputs are stored in the workspace Google bucket by default.
When you might not want to write outputs to the tableAlthough we recommend using data tables, you may not want to if: you cannot fit your data into the data table in a way that makes sense for your analysis; or if you want to test a new method in Terra quickly - with as little setup as possible.
Be careful not to overwrite metadata in the data table If you use the same output name for multiple runs, Terra will overwrite the links in the data table with the most recent output data link. Note: Data from a previous run will still exist in the workspace bucket.
To compare results from different configurations, give your outputs a name that indicates which is which.
How to verify workflow output files
If your output attributes have the format "this.your_filename", the workflow writes output metadata to the "your_filename" column of the data table. You see the additional metadata for these output files in the data table after a successful run.
For example, after completing Exercise 1 in the Workflows-QuickStart tutorial, you see the sample table now contains three new columns. Each column corresponds to a different output filetype: the BAM index, the BAM, and a validation report. The cells include links to files in the workspace Google bucket for each sample. This data is now available for downstream use by other workflows in your workspace (see Exercise 3 of the Workflows Quickstart).
Whether or not you write to the data table, you can find the output files in your workspace Google bucket by clicking on the "Files" icon in the left column of the Data tab.
Note about output file folder names in the workspace bucket Each time you launch a workflow, Terra assigns a unique submission ID to that submission. This submission ID is the name of the output folder in the workspace Google bucket. Outputs from multiple submissions of the same workflow in the same workspace will not be overwritten since they are in different submission ID folders.