Some analyses use input from several input files to generate a single output. In this last exercise, you'll learn how to recognize and work with this sort of workflows.
Workflows that accept arrays (sets) as inputs
- The Optimus pipeline takes in multiple lanes of a sample, but outputs
one file that corresponds to the sample, not the read lanes
- Another example, familiar to cancer researchers, is the
CNV_Somatic_Panel_workflow, which generates a single Panel of
Normals (PoN) from a list of normal samples. The PoN is used when
doing variant calling to filter out systemic errors that occur when
reads are processed.
Identifying a workflow that takes a set (array) as input
You will know a workflow takes a set of inputs (rather than a single file) from the workflow configuration card.
Input is a single entity: variable type = "File"
Input is sets of entities : variable type = "Array[File]":
4.1. Run a workflow with a set (array of entities) as input
Once you identify a workflow takes a set as input, how do you set up and run it? The process is slightly different than running a single input workflow on a set of single entities!
Set up the workflow configuration form
- Select the root entity type "specimen_set" from the dropdown
- Fill in the attribute with the right formatting
Remember the links to the data files are in the "specimen" table, not in the "specimen_set" table! You'll specify this with the format:
- Click the blue "Select Data" button
Notice the extra "s" in the entity attribute
|Even though the data are in the "specimen" table, the formatting to
read/write to the table is
Remember to add an "s" at the end of the entity name. That's right, it's
Select the set of input data files
Last, confirm launch to run the workflow
This analysis runs a single analysis
This is different than in the previous parts, where the workflow was
4.2. Examine the output
Once your workflow completes successfully, you'll want to take a peek at the output!
Where (what table) is the output data file?
The root entity for this workflow was specimen_set, and thus the specimen_set table is where you will find the data: