Learn how to set up (configure) more complex workflow inputs in Terra, including inputs from nested tables (i.e., arrays and pairs). Note: This article is for analyses using inputs from a workspace data table, not hard-coded file paths.
Overview: Configuring nested data as input
When specifying your workflow's inputs, the this.
syntax tells Terra to look in the root entity table for links to input files.
For example, the workflow below takes single files as input, found in the r1_fastq
column of the sample
table.
In the workflow configuration form, the root entity (found in the drop-down menu) is sample
and the first attribute is this.r1_fastq
.
What if you have a more complex table structure?
For example, if your workflow's input is an array of data files, your root entity could be a set table, such as the sample_set
in the example below.
However, set tables typically include the id's for the members of a set, not the links to the files that a workflow most likely needs to operate over. Therefore, your workflow's input attribute field needs to reference the column in the sample
table that stores the file names (r1_fastq
) as well as the column in the sample_set
table that stores that samples to include in the workflow (samples
).
The formatting in this case is nested: this.<entity-name>s.attribute
.
The nested format includes an extra 's'!
-
When a workflow takes an array of files as input, the root entity type might be a
_set
table (for example,sample_set
), but the data files are in the single entity table (i.e., thesample
table).In this case, you use the format
this.samples.attribute-name
. The extra s is always appended to the entity (table) in the nested format.Note: You can also store an array of data files in a single entity table. To learn more about this option, see How to add an array of data files to a table.
-
When working with tumor-normal pairs, Terra uses a pair table that contains columns for the control_sample_id and case_sample_id. The data files for both of these are referenced in the sample table.
To specify these files as inputs for your workflow, your attributes will be named something like
this.case_sample.case_sample_bam
andthis.control_sample.case_sample_bam
where case_sample and control_sample are columns in the pair table and case_sample_bam is a column in the sample table.For more on setting up tumor-normal pair tables in Terra, see Adding pair tables for tumor-normal analysis.
Be careful when selecting from the drop-down menu! If you click into the attribute field for the input variable, the drop-down menu of suggested attribute names only includes columns in the root entity table. So, if you need to specify an input from the sample_set
table but your workflow's root entity is the sample
table, you'll need to type in the full attribute name rather than selecting it from the drop-down menu.
Expected input types (advanced topic)
Editing the WDL script can change the expected input configuration.You can see this by clicking on the workflow in the Workflows tab and looking at in the Inputs section.
Note that you can't edit a workflow script within Terra. To learn more about creating and editing workflows, see Create, edit, and share a new workflow.
-
The example below is from the workflow that generates a "Panel of Normals" (PoN). When generating a PoN, this WDL script expects some of the following input types:
A set of BAM files representing the list of normal samples. Since the purpose of this workflow is to create a PoN from a set of files, this input is handled as an Array.
A reference file. Since a single reference file can be useful in a variety of tasks, this input is handled as a File.
The name of a database used for informing the PoN generation (in this case, the gnomAD database is used to inform the tool of the allelic fractions within this germline resource). Since this task does not need to localize the entire gnomAD database, it is sufficient to designate an input as String matching the name of the database. The name of the PoN file is also just a String.