Welcome to the Workflows Quickstart Tutorial, Part 3. Learn how to run a downstream analysis on the output data generated in Part 2 of the Workflows Quickstart. Running back-to-back workflows turns them into pipelines, which can be automated to help scale your analysis.
Using workflow outputs as inputs downstream
Learning objectives: time and cost to completeIn part 3, you'll learn how data tables can help scale and streamline (automate) your analysis - turning workflows into pipelines
Data tables help scale your analysis
Using a sample set as input makes it quicker to set up the workflow to run on a subset of samples in the sample table, with less possibility of human error. You'll learn to set up a workflow to run on samples from a sample_set table in the configuration form.
Using tables can streamline your analysis
Using the data table makes it easy to run back-to-back workflows - using the outputs of a previous workflow as input for a downstream workflow. Since this can be scripted, it helps automate running back-to-back workflows. You'll learn how to configure back-to-back workflows in the configuration form.
How much will it cost? How long will it take?
The exercise should take no more than fifteen minutes (unless you are in the queue a long time) and cost a few pennies.
In this part, you'll build on the first two Workflows Quickstart exercises by using the outputs from running
CRAM_to_BAM as inputs to the
BAM_to_unmapped_BAM workflow. You will run on the two samples as a group, using the set Terra created in Part 2.
About the workflow
The workflow in Part 3 takes the BAM file output from Part 2 and converts it to an unmapped BAM file (uBAM).
HINT: Right click to open the tutorial demo in a new tab
Setting up the workflow: Overview
Your goal is to run the Part 3 workflow on the output data for the samples in the set you created in Part 2.
Start by selecting the
Part3_BAM_to_unmappedBAM workflow. You'll be directed to the configuration form (see screenshot below). The parts you will need to complete are numbered. See if you can complete them on your own. Open the sections below for hints.
Step 1. Choose the root entity type
HINT: The root entity type is the table that references the most fundamental unit of input data your workflow will run on. If you have more than one table in your workspace, you will need to choose (all data tables will appear in the dropdown). For guidance, see Selecting the root entity type.
If you workflow will run on a single entity, the root entity type is the single entity table. For the Workflows Quickstart, the root entity type is "sample" because the workflow will run on a single sample. Remember, the outputs from the previous workflow were written to the sample table, where you will find them alongside the primary input data.
Step 2. Select Data
For this part, you'll run on the output of the two samples in the set from the Quickstart Part 2. You can do this manually (by selecting the "Choose specific samples to process" radio button) as in part 2. Or you can select the "Choose existing sets of samples" radio button to take advantage of Terra's built in functionality to reduce and streamline running on subsets of any size. See if you can set it up on your own!
Step 3. Configure inputs
Notice you can choose to show only the required variables in the Inputs tab to simplify things.
3.1. Click in the input_bam attribute field.
3.2. Select the output name you set up in Part 2 from the options in the dropdown.
Notice that the dropdown includes all columns in the "samples" data table, including those from Part 1! Be careful to select the name you used as output in Part 2.
3.3. Save your Inputs attributes by clicking on the blue Save button!
Step 4. Configure outputs and run the workflow
As in Part 2, you want to set up the workflow to write output metadata (links to files) to the data table. You'll do this in the Outputs tab of the configuration form.
4.1. Go to the Outputs tab of the configuration form, where you'll fill in the attribute for the output_bams variable.
4.2. To write to the data table, start by typing "this." and then add a name for this attribute. The workflow will generate a column for the generated data in the sample table.
4.3. Save the Outputs.
4.4. Click the blue Run Analysis button to submit your workflows.
You will see the following popup
Once your job is running, you can sit back and wait for the results!
What to expect - successful submissions
Congratulations! If you configured the inputs correctly and the workflow succeeded, you should see the green “succeeded” checkmark in the Job History.
If you go to the Data tab and expand the "sample" table, you will see the outputs under a new column (whatever you named it - uBAM in the example below).
If you click on the "3 items" link, you'll notice that there are three output files for each sample, corresponding to how the workflow processes the data (by separate shards):
Workflow didn't succeed? Try these troubleshooting tips
If your Job History looks like the screenshot below, don’t despair! Especially if your submission failed immediately, it’s likely the error is a mistyped input attribute in the workflow configuration form. You can get further information by clicking the Submission (arrow).
This will lead you to a more detailed page (below). If you hover over the link in the Messages column, you'll get information that can help troubleshooting. In the case below, one of the submissions failed because it didn't find the input file. I was using the output name from Part 1, where I only ran the first sample.
Check attribute names carefully - if you start typing “this.” in the inputs form, you’ll get a dropdown list of available data files. Make sure that the file type (BAM) matches the expected input.
Check the log files by selecting the submission details (in the box outlined orange in the screenshot above).
For more tips, see Troubleshooting Workflows: Tips and Tricks.
|Congratulations! You've completed Part 3 of the Workflows Quickstart!|
Your input data may be more complex than a single entity table: if your workflow inputs several samples to generate a single output file, for example. To learn how to set up workflows with more complex table associations, see Configuring workflow inputs: sets and pairs.