Configuring workflow inputs - sets and pairs tables

Allie Hajian
  • Updated

Learn how to set up (configure) more complex workflow inputs in Terra, including inputs from nested tables (i.e., arrays and pairs). Note: This article is for analyses using inputs from the workspace data table.

Overview: Configuring nested data as input

When specifying your workflow's inputs, the this. syntax tells Terra to look in the root entity table for links to input files.  

For example, the workflow below takes single files as input, found in the r1_fastq column of the sample table. 
Screenshot showing an example data table. An orange box highlights the table's name in the left-hand sidebar. Another orange box highlights a column in the table that holds the file names for individual FASTQ files.

In the workflow configuration form, the root entity (found in the drop-down menu) is sample and the first attribute is this.r1_fastq
Screenshot showing the configuration menu for an example workflow. An orange box highlights the drop-down menu used to select the root entity for the workflow. Another orange box highlights the first input attribute for the workflow.

What if you have a more complex relationship?

E.g., if your input is an array of data files, your root entity could be a sample_set table. However, that table includes only the sample IDs - not the links to the data files, which is what is ultimately needed as input.
Screenshot showing an example sample_set table. An orange box highlights the table's name in the left-hand sidebar. Another orange box highlights a column in the table that holds the sample id's for each sample in a given set. A screenshot showing the sample id's that comprise this example set is super-imposed on top of the table.

The input attribute field needs to reference the column in the sample table that stores the file names (r1_fastq) as well as the column in the sample_set table that stores which samples to include in the workflow (samples). 

The formatting in this case is nested: this.<entity-name>s.attribute

The nested format includes an extra 's'!

Screenshot showing the configuration menu for an example workflow using a set table. An orange rectangle highlights the variable column in the inputs table. Another orange rectangle highlights the attribute names used to point the workflow to these variables within the workflow's root entity.

  • When a workflow takes an array of files as input, the root entity type might be a _set table (for example, sample_set), but the data files are in the single entity table (i.e., the sample table).

    In this case, you use the format this.samples.attribute-name. The extra s is always appended to the entity (table) in the nested format. 

    Note: You can use an array of data files in a single entity table. To learn more about this option, see How to add an array of data files to a table.

  • When working with tumor-normal pairs, Terra uses a pair table that contains columns for the control_sample_id and case_sample_id. The data files for both of these are referenced in the sample table. 

    To specify these files as inputs for your workflow, your attributes will be named something like  this.case_sample.case_sample_bam and this.control_sample.case_sample_bam where case_sample and control_sample are columns in the pair table and case_sample_bam is a column in the sample table. 

    For more on setting up tumor-normal pair tables in Terra, see Adding pair tables for tumor-normal analysis.

Be careful when selecting from the drop-down menu! If you click into the attribute field for the input variable, the drop-down menu of suggested attribute names only includes columns in the root entity table. So, if you need to specify an input from the sample_set table but your workflow's root entity is the sample table, you'll need to type in the full attribute name rather than selecting it from the drop-down menu.

Expected input types (advanced topic)

To edit a workflow script, you need to work outside Terra. To learn more about creating and editing workflows, see Create, edit, and share a new workflow.

Editing the WDL script can change the expected input configuration.You can see this by clicking on the workflow in the Workflows tab and looking at in the Inputs section.

  • The example below is from the workflow that generates a "Panel of Normals" (PoN). When generating a PoN, this WDL script expects some of the following input types:

    A set of BAM files representing the list of normal samples. Since the purpose of this workflow is to create a PoN from a set of files, this input is handled as an Array.

    A reference file. Since a single reference file can be useful in a variety of tasks, this input is handled as a File.

    The name of a database used for informing the PoN generation (in this case, the gnomAD database is used to inform the tool of the allelic fractions within this germline resource). Since this task does not need to localize the entire gnomAD database, it is sufficient to designate an input as String matching the name of the database. The name of the PoN file is also just a String.

    Diagram showing colored lines connecting a screenshot of an example workflow's WDL script to a screenshot of the workflow's configuration menu. This illustrates that each input's type in the workflow configuration menu has to match the type defined for that input in the WDL script.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.