Configuring workflow inputs - sets and pairs tables

Allie Hajian
  • Updated

Learn how to set up (configure) more complex workflow inputs in Terra, including inputs from nested tables (i.e. arrays and pairs). Note that this article is for analyses using inputs from the workspace data table.

Advanced formatting examples

The this. tells Terra to look first in the table you set as your root entity. This is clear if links to the data files are included in a column of the root entity data table. 

For example, the workflow below takes single files as input, found in the r1_fastq column of the specimens table. 

In the workflow configuration form, the root entity is specimens and the attribute is this.specimens.r1_fastq (found in the dropdown menu). 

What if you have a more complex relationship?

If your input is an array of specimens, for example, the specimen_set table is the root entity, but that table only includes the specimen IDs.

The attribute field needs to reference the right column in the specimens table (r1_fastq) as well as the column in the specimen_set table (specimens). Configuring-sets-and-arrays_Selecting-from-drop-down-part2_Screen_shot.png

Read on for examples of the formatting you will use when working with more complex inputs such as arrays of data files, sets, or tumor/normal pairs. 

Be careful when selecting from the dropdown! If you click into the attribute field for the input variable, the dropdown menu only includes columns in the root entity table. If you are using inputs from the specimen_set table, for example, but the data files are in the specimen table, you will first need to select this.specimens from the dropdown and then add the column in the specimens table where the data files are.

Only this.specimens is an option in the dropdown. To specify the data files location requires additional formatting. 

  • When a workflow takes an array of files as input, the root entity type is a _set table (for example, specimen_set), but the data files are actually in the single entity table (i.e. the specimen table).

    In this case, you will use the format 

  • The pair table contains columns for the control_sample_id and case_sample_id. The data files are referenced in the sample table. Your WDL task requires both the case_sample_bam and the case_sample_bam for input.

    You'd use this.case_sample.case_sample_bam and this.control_sample.case_sample_bam where case_sample and control_sample are columns in the pair table. 

Editing expected input entity types (advanced topic)

To edit a workflow script, you will need to work outside Terra. To learn more about creating and editing workflows, see Create, edit, and share a new workflow.

Editing the WDL script can change the expected input configuration.You will be able to see this by clicking on the workflow in the Workflows tab and looking at in the Inputs section.

  • The example below is from the workflow that generates a "Panel of Normals" (PoN). When generating a PoN, this WDL script expects some of the following input types.

    A set of BAM files representing the list of normal samples. Since the purpose of this workflow is to create a PoN from a set of files, this input is handled as an Array.

    A reference file. Since a single reference file can be useful in a variety of tasks, this input is handled as a File.

    The name of a database used for informing the PoN generation (in this case, the gnomAD database is used to inform the tool of the allelic fractions within this germline resource). Since this task does not need to localize the entire gnomAD database, it is sufficient to designate an input as String matching the name of the database. The name of the PoN file is also just a String.



Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request



Please sign in to leave a comment.