Workflows Quickstart Part 1 - Run pre-configured workflow

Allie Hajian
  • Updated

Welcome to the Workflows Quickstart Tutorial, Part 1. Learn how to launch and monitor a preconfigured workflow to analyze a single entity of genomic data in Terra.

There are three parts to the Workflows Quickstart. Each is independent, with its own learning objectives and time and cost estimates to complete. You should do the three in order, but you don’t need to do them in one sitting.

What you will learn
This exercise will give you a feel for the mechanics of running a workflow successfully on
data in the data table. For input, you’ll use downsized sample data (stored in a public Google bucket) already referenced in the workspace "sample" table. The workflow is set up to take input from and write output to the table. For Part 1 of the Quickstart, the setup has been done for you. You will get an overview of the form, choose data (pre-loaded in the workspace table), and launch the workflow.

How much will it cost? How long will it take?
The exercise should take no more than fifteen minutes (unless you are in the queue a long time) and cost a few pennies.

Hint: Right-click to open the tutorial demo in a new tab

Before you start - Clone your own Quickstart workspace

The Workflows-Quickstart featured workspace is “Read only”. For hands-on practice, you'll need to be able to run workflows and store data in your workspace bucket. Making your own copy of the workspace allows you to do that since you're the owner. If you haven't already done so, you'll need to make your own copy of this workspace following the directions below.

Start by clicking on the round circle with three dots at the upper right-hand corner and select Clone from the dropdown menu. Then follow the directions below to complete the form.

Screenshot of the Dashboard of the Terra Workflows Quickstart workspace. The image is annotated to include an orange box and arrow highlighting the three-dot menu in the upper right-hand corner of the workspace.

  • Data-QuickStart-Part1_Clone-workspace-screen.png
    1. Rename your copy something memorable
      It may help to write down the name of your workspace
    2. Choose your billing project
      Note that this can be "getting started" credits from GCP! Don’t worry, you’ll have plenty left
      over when you’ve completed the Quickstart
      exercises.
    3. Do not select an Authorization Domain, since these are only required when using restricted-access data
    4. Click the “Clone Workspace” button to make your own copy

Step 1: Open the workflow setup form

Once you're in your own copy of the workspace, you'll be ready to get hands-on to learn about setting up and running workflows!

1.1. Start by going to the Workflows page.

1.2. Select the Part1_CRAM_to_BAM workflow by clicking on the card.

Screenshot of the Workflows tab of the Terra Workflows Quickstart workspace. The image is annotated with an orange box and arrow to highlight the Part1_CRAM-to-BAM workflow card.

This will reveal the workflow configuration form where you'll set up the workflow to run on your data.Workflows-Quickstart-part1_Configuration-form_Screen_shot.png

Some details about the quickstart workflows: The workflows in Parts 1 and 2 of the Quickstart are identical - they convert genomic files from one format (CRAM) to another (BAM) for downstream analysis. They’ve been renamed to simplify the instructions. This workflow should complete in just a few minutes once it starts running.

Step 2: Select data

Workflows-Quickstart-Part-1_Select-data-button_Screen_shot.png

1. Confirm root entity type = "sample". This is the table that contains the input data.

2. Click the "Select Data" button. This will take you to the Select Data form (below).
Workflows-QuickStart_Part1_Select-data-form_Screen_shot.png

3. Select the Choose specific rows to process radio button.

4. Select the NA12878 sample.

5. Click the blue OK button to finalize your selection.

Additional pre-configured runtime options

Additional runtime and cost-savings options have been set with the defaults. These are fine to use in many cases (including the quickstart). If you're curious, click below for more details or what to expect. 

Step 3: Confirm and launch

3.1. In the workflow configuration form, click the blue RUN ANALYSIS button to submit your workflow.
Workflows-Quickstart-Part1_Run-analysis_Screen_shot.png

3.2. Click LAUNCH in the Confirm launch popup.
Workflows-Quickstart-Part-1_Confirm-launch_Screen_shot.png

3.3. You'll be directed to the Job History page where you can monitor your submission status (highlighted below).

For job status updates, refresh the page.
Screenshot of the Job History page annotated to highlight the job submission.

Your submission is complete! What to expect

When your job completes successfully, you'll see a green checkmark in the Status column of the Job History page. This should only take a couple of minutes once the job starts running (see What happens when you launch a workflow for more details about things that can cause your job to remain in the submitted or queued stage).
Screenshot of the Job History page after a job has finished running. The image is annotated with an orange arrow to highlight that the job status has been updated to done.

Once you see the green check, go back to the Data page

Your data table will include three new columns (analysis_ready_BAI, analysis_ready_BAM, and CRAM_to_BAM_valdation_report).
S49c_Exercise_1_Data_Table_Screen_Shot.png

Where’s the (generated) data stored? Generated data from a workflow is stored by default in the Workspace bucket. You can check that the files are in the Workspace bucket by clicking on the “File” icon (bottom of the far left column) in the Data tab. Note that you will need to go down several file directories to get to the data files (NA12878.bam and NA12878.bai).

Follow-up (thought) questions

  • Answer: It now has additional columns that include links to the generated data in the workspace bucket. Because the columns are added to the input table, generated data is associated with input data automatically.  

    Sample table after a completed run
    Screenshot of the Sample table in the Data tab of the Terra Workflows Quickstart workspace. The image is annotated with an orange box to highlight the analysis_ready_BAI, analysis_ready_BAM, and validation_report columns that were added to the table after the completion of the Part1_CRAM-to-BAM workflow.
  • Answer: The workflow generated the columns automatically because it was set up to write the generated metadata to the data table.

    Hint: select the workflow card in the Workflows tab and compare the “Outputs” attributes to the new columns in the sample table

    Outputs configuration
    Workflows-Quickstart-Part-1_Output-write-to-table_Screen_shot.png
  • Answer: The new columns include metadata links to the generated data. The actual data is stored in the workspace bucket, which you can access by clicking on the "Files" link from the Data page. Note that you will need to go down several directory levels to find the actual data files. 

    Screenshot of the Files stored in the workspace bucket. The image is annotated with orange boxes highlighting the location of the Files in the Data tab, the BAM file, and the BAM index file.
G0-smiley-icon.png Congratulations! You've completed Part 1 of the Workflows Quickstart!

 

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.