Learn how to set up (configure) more complex workflow inputs in Terra, including inputs from nested tables (i.e., arrays and pairs). Note: This article is for analyses using inputs from the workspace data table.
Overview: Configuring nested data as input
When specifying your workflow's inputs, the this.
syntax tells Terra to look in the root entity table for links to input files.
For example, the workflow below takes single files as input, found in the r1_fastq column of the sample table.
In the workflow configuration form, the root entity (found in the drop-down menu) is sample and the first attribute is this.r1_fastq
.
What if you have a more complex relationship?
E.g., if your input is an array of data files, your root entity could be a sample_set table. That table includes only the sample IDs - not the links to the data files.
The input attribute field needs to reference the column in the sample table that stores the file names (r1_fastq) as well as the column in the sample_set table that stores which samples to include in the workflow (samples).
The formatting in this case is nested: this.entity.attribute
.
-
When a workflow takes an array of files as input, the root entity type might be a _set table (for example,
sample_set
), but the data files are in the single entity table (i.e., thesample
table).In this case, you use the format
this.samples.attribute-name
Note: You can use an array of data files in a single entity table. To learn more about this option, see How to add an array of data files to a table.
-
When working with tumor-normal pairs, Terra uses a pair table that contains columns for the control_sample_id and case_sample_id. The data files for both of these are referenced in the sample table.
To specify these files as inputs for your workflow, your attributes will be named something like
this.case_sample.case_sample_bam
andthis.control_sample.case_sample_bam
where case_sample and control_sample are columns in the pair table and case_sample_bam is a column in the sample table.For more on setting up tumor-normal pair tables in Terra, see Adding pair tables for tumor-normal analysis.
Be careful when selecting from the drop-down menu! If you click into the attribute field for the input variable, the drop-down menu of suggested attribute names only includes columns in the root entity table. So, if you need to specify an input from the sample_set
table but your workflow's root entity is the sample
table, you'll need to type in the full attribute name rather than selecting it from the drop-down menu.
Expected input types (advanced topic)
To edit a workflow script, you need to work outside Terra. To learn more about creating and editing workflows, see Create, edit, and share a new workflow.
Editing the WDL script can change the expected input configuration.You can see this by clicking on the workflow in the Workflows tab and looking at in the Inputs section.
-
The example below is from the workflow that generates a "Panel of Normals" (PoN). When generating a PoN, this WDL script expects some of the following input types:
A set of BAM files representing the list of normal samples. Since the purpose of this workflow is to create a PoN from a set of files, this input is handled as an Array.
A reference file. Since a single reference file can be useful in a variety of tasks, this input is handled as a File.
The name of a database used for informing the PoN generation (in this case, the gnomAD database is used to inform the tool of the allelic fractions within this germline resource). Since this task does not need to localize the entire gnomAD database, it is sufficient to designate an input as String matching the name of the database. The name of the PoN file is also just a String.