Configuring workflow inputs - sets and pairs tables

Allie Hajian
  • Updated

Learn how to set up (configure) more complex workflow inputs in Terra, including inputs from nested tables (i.e., arrays and pairs). Note: This article is for analyses using inputs from the workspace data table.

Overview: Configuring nested data as input

The this. tells Terra to look in the root entity table for links to input files.  

For example, the workflow below takes single files as input, found in the r1_fastq column of the sample table. 
Mouse-FASTQ-lanes-in-sample-table_Screen_shot.png

In the workflow configuration form, the root entity is sample and the attribute is this.r1_fastq (found in the drop-down menu). 
Configure-single-sample-input_Screen_shot.png

What if you have a more complex relationship?

E.g., if your input is an array of data files, your root entity could be a sample_set table. That table  includes only the sample IDs - not the links to the data files.
Mouse-FASTQ-lanes-in-sample-set-table_Screen_shot.png

The Input attribute field needs to reference the right column in the sample table (r1_fastq) as well as the column in the sample_set table (samples). 

The formatting in this case is nested: this.entity.attribute

Configure-sample-set-as-input_Screen_shot.png

  • When a workflow takes an array of files as input, the root entity type might be a _set table (for example, sample_set), but the data files are in the single entity table (i.e., the sample table).

    In this case, you use the format this.samples.attribute-name

    Note: You can use an array of data files in a single entity table. To learn more about this option, see How to add an array of data files to a table.

  • The pair table contains columns for the control_sample_id and case_sample_id. The data files are referenced in the sample table. Your WDL task requires both the case_sample_bam and the case_sample_bam for input.

    You'd use this.case_sample.case_sample_bam and this.control_sample.case_sample_bam where case_sample and control_sample are columns in the pair table. 

Be careful when selecting from the drop-down menu! If you click into the attribute field for the input variable, the drop-down menu only includes columns in the root entity table. E.g., if you use inputs from the sample_set table, but the data files are in the sample table, you need to select this.samples from the drop-down menu and then add the column in the sample table where there are links to the data files.

Only this.samples is an option in the drop-down menu. To specify the data files location requires additional formatting. 

Editing expected input entity types (advanced topic)

To edit a workflow script, you need to work outside Terra. To learn more about creating and editing workflows, see Create, edit, and share a new workflow.

Editing the WDL script can change the expected input configuration.You can see this by clicking on the workflow in the Workflows tab and looking at in the Inputs section.

  • The example below is from the workflow that generates a "Panel of Normals" (PoN). When generating a PoN, this WDL script expects some of the following input types:

    A set of BAM files representing the list of normal samples. Since the purpose of this workflow is to create a PoN from a set of files, this input is handled as an Array.

    A reference file. Since a single reference file can be useful in a variety of tasks, this input is handled as a File.

    The name of a database used for informing the PoN generation (in this case, the gnomAD database is used to inform the tool of the allelic fractions within this germline resource). Since this task does not need to localize the entire gnomAD database, it is sufficient to designate an input as String matching the name of the database. The name of the PoN file is also just a String.

    EntityTypes.png

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.