Some analyses use input from several input files to generate a single output. In this last part of the Workflows Quickstart tutorial, you'll learn how to recognize and work with this sort of workflows.
Workflows that accept arrays (sets) as inputs
- The Optimus pipeline takes in multiple lanes of a sample, but outputs one file that corresponds to the sample, not the read lanes.
- Another example, familiar to cancer researchers, is the CNV_Somatic_Panel_workflow, which generates a single Panel of Normals (PoN) from a list of normal samples. The PoN is used when doing variant calling to filter out systemic errors that occur when reads are processed.
Identifying a workflow that takes a set (array) as input
You will know a workflow takes a set of inputs (rather than a single file) by the input file type in the workflow configuration card.
Input is a single entity: variable type = "File"
Input is sets of entities: variable type = "Array[File]"
4.1. Run a workflow with a set (array of entities) as input
Once you identify a workflow takes a set as input, how do you set up and run it? The process is slightly different than running a single input workflow on a set of single entities!
Set up the workflow configuration form
1. Open the "2-Sets-as-Input-Workflow" from the Workflows page.
2. Select the root entity type "specimen_set" from the dropdown.
3. Confirm the input attribute formatting in the Inputs.
Remember the links to the data files are in the "specimen" table, not in the "specimen_set" table! You need to tell the WDL to a) first go to the specimens column of the specimen_set table to get the IDs of the specimens in the array, b) then to go to the r1_fastq column of each specimen in the set to get the data.
You'll specify this with the format:
this.specimens.r1_fastq(already filled in)
4. Click the blue "Select Data" button.
|Even though the data are in the "specimen" table, the formatting to read/write to the
Remember to add an "s" at the end of the entity name. That's right, it's specimens, not
Select the set of input data files
1. Click the "Choose existing sets" radio button
2. Select one of your sets from the available options
Last, confirm launch to run the workflow (click for screenshots and hints)
This is different than in previous parts 1, 2, and 3, where the workflows were designed to analyze one
4.2. Examine the output
Once your workflow completes successfully, you'll want to take a peek at the output!
Question: Where (what table) is the output data file?
The root entity for this workflow was specimen_set, and thus the specimen_set table is where you will find the data: