Workflows Quickstart Part 3 - Run a back-to-back analysis (pipeline)

Allie Hajian
  • Updated

Welcome to the Workflows Quickstart Tutorial, Part 3. Learn how to run a downstream analysis on the output data generated in Part 2 of the Workflows Quickstart. Running back-to-back workflows turns them into pipelines, which can be automated to help scale your analysis.

Using workflow outputs as inputs downstream

Learning objectives: time and cost to completeIn part 3, you'll learn how data tables can help scale and streamline (automate) your analysis - turning workflows into pipelines.

Data tables help scale your analysis
Using a sample set as input makes it quicker to set up the workflow to run on a subset of samples in the sample table, with less possibility of human error. You'll learn to set up a workflow to run on samples from a sample_set table in the configuration form.

Using tables can streamline your analysis
Using the data table makes it easy to run back-to-back workflows - using the outputs of a previous workflow as input for a downstream workflow. Since this can be scripted, it helps automate running back-to-back workflows. You'll learn how to configure back-to-back workflows in the configuration form.

How much will it cost? How long will it take? 
The exercise should take no more than fifteen minutes (unless you are in the queue a long time) and cost a few pennies.

In this part, you'll build on the first two Workflows Quickstart exercises by using the outputs from running CRAM_to_BAM as inputs to the BAM_to_unmapped_BAM workflow. You will run on the two samples as a group, using the set Terra created in Part 2.

About the workflow

The workflow in Part 3 takes the BAM file output from Part 2 and converts it to an unmapped BAM file (uBAM). 

HINT: Right click to open the tutorial demo in a new tab

Setting up the workflow: Overview

Your goal is to run the Part 3 workflow on the output data for the samples in the set you created in Part 2. 

Before you start: Workflows from Part 2 must be complete Because you will be using the outputs from the previous workflow as inputs for this exercise, part 2 needs to have run successfully before you can follow these directions.

Start by selecting the Part3_BAM_to_unmappedBAM workflow. You'll be directed to the configuration form (see screenshot below). The parts you will need to complete are numbered. See if you can complete them on your own. Open the sections below for hints. 

Workflows-QuickStart-Part3_Config-form_Screen_shot.png

Step 1. Choose the root entity type

HINT: The root entity type is the table that references the most fundamental unit of input data your workflow will run on. If you have more than one table in your workspace, you will need to choose (all data tables will appear in the dropdown). For guidance, see Selecting the root entity type

  • Workflows-QuickStart_Part3-Select-root-entity-type-Answer_Screen_shot.png

    If you workflow will run on a single entity, the root entity type is the single entity table. For the Workflows Quickstart, the root entity type is "sample" because the workflow will run on a single sample. Remember, the outputs from the previous workflow were written to the sample table, where you will find them alongside the primary input data.

Step 2. Select Data

For this part, you'll run on the output of the two samples in the set from the Quickstart Part 2. You can do this manually (by selecting the "Choose specific samples to process" radio button) as in part 2. Or you can select the "Choose existing sets of samples" radio button to take advantage of Terra's built in functionality to reduce and streamline running on subsets of any size. See if you can set it up on your own!

  • Workflows-Quickstart-Part-3_Choose-existing-set_Screen_shot.png

Step 3. Configure inputs

Notice you can choose to show only the required variables in the Inputs tab to simplify things.

  • 3.1. Click in the input_bam attribute field.

    3.2. Select the output name you set up in Part 2 from the options in the dropdown.

    Workflows-QuickStart_Part3_Configure-inputs_Screen_shot.png

    Notice that the dropdown includes all columns in the "samples" data table, including those from Part 1! Be careful to select the name you used as output in Part 2.        

    3.3. Save your Inputs attributes by clicking on the blue Save button!

Step 4. Configure outputs and run the workflow

As in Part 2, you want to set up the workflow to write output metadata (links to files) to the data table. You'll do this in the Outputs tab of the configuration form

  • 4.1. Go to the Outputs tab of the configuration form, where you'll fill in the attribute for the output_bams variable.

    4.2. To write to the data table, start by typing "this." and then add a name for this attribute. The workflow will generate a column for the generated data in the sample table. 

    Workflows-QuickStart_Part3_Configure-outputs_Screen_shot.png

    4.3. Save the Outputs.

    4.4. Click the blue Run Analysis button to submit your workflows.

    You will see the following popup. Click Launch to start the workflow.

    Workflows-Quickstart-Part-3_Confirm-launch_Screen_shot.png

     

Once your job is running, you can sit back and wait for the results!

What to expect - successful submissions

Congratulations! If you configured the inputs correctly and the workflow succeeded, you should see the green “succeeded” checkmark in the Job History.

Workflows-Quickstart-Part3_Job-History-page-with-successfully-completed-workflow_Screenshot.png

If you go to the Data tab and expand the "sample" table, you will see the outputs under a new column (whatever you named it - uBAM in the example below).

Workflows-Quickstart-Part3_Data-page-after-sucessful-run_Screenshot.png

If you click on the "3 items" link, you'll notice that there are three output files for each sample, corresponding to how the workflow processes the data (by separate shards):

Workflows-QuickStart_Part3_Outputs-closeup_Screen_shot.png

Workflow didn't succeed? Try these troubleshooting tips

If your Job History looks like the screenshot below, don’t despair! Especially if your submission failed immediately, it’s likely the error is a mistyped input attribute in the workflow configuration form. You can get further information by clicking the Submission (arrow).

Workflows-Quickstart-Part3_Failed-submission_Screenshot.png

This will lead you to a more detailed page (below). If you hover over the link in the Messages column, you'll get information that can help troubleshooting. In the case below, one of the submissions failed because it didn't find the input file. I was using the output name from Part 1, where I only ran the first sample.
Workflows-QuickStart_Part3_Failed-workflow-message_Screen_shot.png

Check attribute names carefully - if you start typing “this.” in the inputs form, you’ll get a dropdown list of available data files. Make sure that the file type (BAM) matches the expected input.

Check the log files by selecting the submission details (in the box outlined orange in the screenshot above).

For more tips, see Troubleshooting Workflows: Tips and Tricks.

G0-smiley-icon.png Congratulations! You've completed Part 3 of the Workflows Quickstart!

Next steps

Your input data may be more complex than a single entity table: if your workflow inputs several samples to generate a single output file, for example. To learn how to set up workflows with more complex table associations, see Configuring workflow inputs: sets and pairs

Was this article helpful?

2 out of 2 found this helpful

Comments

0 comments

Please sign in to leave a comment.