Welcome to the Workflows Quickstart Tutorial, Part 3. Learn how to run a downstream analysis on the output data you generated in Part 2 of the Workflows Quickstart.
Using workflow output data in downstream analysis
Learning objectives: time and cost to completeIn part 3, you'll learn how data tables can help streamline and scale your analysis.
How data tables help scale your analysis
Using a sample set as input makes it quicker to set up the workflow to run on a subset of samples in the sample table, with less possibility of human error. You'll learn to set up a workflow to run on samples from a sample_set table in the configuration form.
Streamlining your analysis
Using the data table makes it easy for a workflow to use the outputs of a previous workflow analysis as inputs. Since this can be scripted, it helps automate running back-to-back workflows. You'll learn how to configure back-to-back workflows in the configuration form.
How much will it cost? How long will it take?
The exercise should take no more than fifteen minutes (unless you are in the queue a long time) and cost a few pennies.
You'll build on the first two exercises by configuring a workflow to use the outputs from
Part_2_CRAM_to_BAM as inputs to the
Part_3_BAM_to_unmapped_BAM workflow. You
will run on the two samples as a group, using the set Terra created in Part 2.
About the workflow
The workflow in Part 3 takes the BAM file output from Part 2 and converts it to an unmapped BAM file (uBAM).
HINT: Right click to open the tutorial demo in a new tab
Setting up the workflow: Overview
Your goal is to run the Part 3 workflow on the output data in the set you created in Part 2.
Start by selecting the
Part3_BAM_to_unmappedBAM workflow. You'll be directed to the configuration form (see screenshot below). The parts you will need to complete are numbered. See if you can complete them on your own. Open the sections below for hints.
Step 1. Choose the root entity type
HINT: The root entity is the table where the input data files are referenced.
The root entity type is "sample" because the outputs from the previous workflow were written to the sample table, where you will find them alongside the primary input data.
Step 2. Select Data
For this part, you'll run on the output of the two samples in the set from the Quickstart Part 2. See if you can set it up on your own!
Step 3. Configure Input data
Notice you can choose to show only the required variables in the Inputs tab to simplify things.
3.1. In the input_bam attribute field, start typing
this. Select the output name you set up in Part 2 from the options in the dropdown.
Notice that the dropdown includes all columns in the "samples" data table, including those from Part 1! Be careful to select the name you used as output in Part 2.
3.2. Save your Inputs attributes by clicking on the blue Save button!
Step 4. Configure the outputs and run the workflow
4.1. Go to the Outputs tab of the configuration form, where you'll fill in the attribute for the output_bams variable.
4.2. To write to the data table, start by typing "this." and then add a name for this attribute. The workflow will generate a column for the generated data in the sample table.
4.3. Save the Outputs.
4.4. Click the blue Run Analysis button to submit your workflows.
You will see the following popup
Once your job is running, you can sit back and wait for the results!
What to expect - successful submissions
Congratulations! You configured the inputs correctly and the workflow succeeded. You should see the green “succeeded” icon in the Job History.
If you go to the Data tab and expand the "sample" table, you will see the outputs under a new column (whatever you named it - uBAM in the example below).
If you click on the "3 items" link, you'll notice that there are three output files for each sample, corresponding to how the workflow processes the data (by separate shards):
Workflow didn't succeed? Try these troubleshooting tips
This will lead you to a more detailed page (below). If you hover over the link in the Messages column, you'll get information that can help troubleshooting. In the case below, one of the submissions failed because it didn't find the input file. I was using the output name from Part 1, where I only ran the first sample.
Troubleshooting tips and tricks
- Check attribute names carefully - if you start typing “this.” in the inputs form, you’ll get a dropdown list of available data files. Make sure that the file type (BAM) matches the expected input.
- Check the log files by selecting the submission details (in the box outlined orange in the screenshot above).
For more tips, see Troubleshooting Workflows: Tips and Tricks.
|Congratulations! You've completed Part 3 of the Workflows Quickstart!|