Get started running workflows

Allie Hajian
  • Updated

Workflows (aka pipelines) are a series of steps - performed by an external compute engine - often used for automated, bulk analysis (such as aligning genomic reads). Pipelines run on Terra are written in Workflow Description Language (WDL), a workflow processing language that is easy for humans to read and write. 

What you need to run a workflow (pipeline) in a Terra workspace "Can compute" access to a workspace  
You need permission to do any operations that have a GCP cost (i.e. run workflows) in a
workspace. You can do this if someone shares a workspace with you as "can-compute
writer." If you create or copy a workspace using your own Billing project, you are the owner, by default, and can run workflows. 

One or more workflows
If you clone a Workspace that already contains workflows (see Showcase workspaces in the Library), these tools will be in your copy as well. If the Workflows tab of your workspace is empty, you can import workflows from the Terra library (code and workflows section). 

Input data
Input data files can be located in the workspace Google bucket, or linked to the workspace by metadata in the data table.

Practice pipelining with the Workflows Quickstart

We think the best way to get started running workflows on Terra is to dive in and get started! The Terra QuickStart workspace is a hands-on tutorial that guides you through the process. You'll follow the steps below to get experience with increasingly more real-life analyses. Setting up and running each exercises should take about 10 - 15 minutes. 

Tutorial workspace | Step-by-step guide

Copy the Terra Workflows Quickstart workspace to your own billing account and work through the three exercises.

Part 1 - Run a preconfigured workflow on a single sample from the Workflows pageWorkflows-QuickStart_Part1_flow.png

Part 2 - Set up and run a workflow on two samples (you'll create a set in the process)Workflows-QuickStart_Part2_flow.png

Part 3 - Run downstream analysis on a set of samples Workflows-QuickStart_Part3_flow.png

Part 1: Run a preconfigured workflow

Here you'll run your first workflow, using a pre-configured workflow and input data from the data table. You'll run a short file format conversion (BAM_to_CRAM) workflow on a downsized sample BAM, NA12868, that is stored in a public bucket and referenced in the workspace sample data table. In addition to giving you the satisfaction of running your first workflow, you'll see how the data table gets updated with workflow outputs (this workflow is configured to write to the data table). 

Follow the step-by-step instructions to select the data from the table and run the analysis. 

Part 2: Configure and run a workflow analysis

After running your first completely pre-configured workflow on sample data, you'll move onto the next step - running the same workflow but setting up the configuration form from scratch. In this exercise, you'll get a sense of how to configure all variables and attributes of a workflow using the built-in Terra interface. You'll run on two samples to see how Terra generates a sample set to help streamline additional analysis. 

Follow the step-by-step instructions here.  

Part 3: Configure and run a downstream analysis on generated data

If you're using a data table for analysis inputs and outputs, it is straightforward to run a follow-up analysis on the output of a workflow. In part three you'll set up a downstream analysis on the data generated in part 3. 

Follow the step-by-step instructions here

Once you complete all three exercises, you should be well on your way to running your own workflows on your own data. 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.