Workflows Quickstart Part 1 - Run pre-configured workflow

Allie Hajian
  • Updated

Welcome to the Workflows Quickstart Tutorial, Part 1. Learn how to launch and monitor a preconfigured workflow to analyze a single entity of genomic data in Terra.

There are three parts to the Workflows Quickstart. Each is independent, with its own learning objectives and time and cost estimates to complete. You should  do the three in order, but you don’t need to do them in one setting.

Learning objectives: time and cost to completeWhat you will learn
This exercise will give you a feel for the mechanics of running a workflow successfully on
data in the data table. For input you’ll use downsized sample data (stored in a public Google bucket) already referenced in the workspace "sample" table. The workflow is set up to take input from and write output to the table. For Part 1 of the Quickstart, the setup has been done for you. You will get an overview of the form, choose data (pre-loaded in the workspace table), and launch the workflow.

How much will it cost? How long will it take?

The exercise should take no more than fifteen minutes (unless you are in the queue a long time) and cost a few pennies.

Hint: Right click to open the tutorial demo in a new tab

Before you start - Clone your own Quickstart workspace

The Workflows-Quickstart featured workspace is “Read only”. For hands-on practice, you'll need to be able to run workflows and store data in your workspace bucket. Making you own copy of the workspace allows you to do that, since you're the owner. If you haven't already done so, you'll need to make your own copy of this workspace following the directions below.

Start by clicking on the round circle with three dots at the upper right hand corner and select "Clone" from the dropdown menu. Then follow the directions below to complete the form:
S49a_Clone_QuickStart_Screen_Shot.png

  • Data-QuickStart-Part1_Clone-workspace-screen.png
    1. Rename your copy something memorable
      It may help to write down the name of your workspace
    2. Choose your billing project
      Note that this can be "getting started" credits from GCP! Don’t worry, you’ll have plenty left
      over when you’ve completed the Quickstart
      exercises.
    3. Do not select an Authorization Domain, since these are only required when using restricted-access data
    4. Click the “Clone Workspace” button to make your own copy

Step 1: Open the workflow setup form

Once you're in your own copy of the workspace, you'll be ready to get hands-on to learn about setting up and running workflows!

1.1. Start by going to the Workflows page.

1.2. Select the Part1_CRAM_to_BAM workflow by clicking on the name in the card.
Workflows-Quickstart-Part1_Select-workflow_Screen_shot.png

This will reveal the workflows configuration form where you'll set up the workflow to run on your data.Workflows-Quickstart-part1_Configuration-form_Screen_shot.png

Some details about the quickstart workflows The workflows in Parts 1 and 2 of the Quickstart are identical - they convert genomic files from one format (CRAM) to another (BAM) for downstream analysis. They’ve been renamed to simplify the instructions. This workflow should complete in just a few minutes once it starts running.

Step 2: Select data

Workflows-Quickstart-Part-1_Select-data-button_Screen_shot.png

2.1. Confirm root entity type = "sample". This is the table that contains the input data.

2.2. Click the "Select Data" button. This will take you to the Select Data form (below).
Workflows-QuickStart_Part1_Select-data-form_Screen_shot.png

2.3. Select the Choose specific rows to process radio button.

2.4. Select the NA12878 sample.

2.5. Click the blue OK button to finalize your selection.

Additional pre-configured runtime options

Additional runtime and cost-savings options have been set with the defaults. These are fine to use in many cases (including the quickstart). If you're curious, click below for more details or what to expect. 

Step 3: Confirm and launch

3.1. In the workflow configuration form, click the blue RUN ANALYSIS button to submit your workflow.
Workflows-Quickstart-Part1_Run-analysis_Screen_shot.png

3.2. Click LAUNCH in the Confirm launch popup.
Workflows-Quickstart-Part-1_Confirm-launch_Screen_shot.png

3.3. You'll be directed to the Job History page where you can monitor your submission status (highlighted below).

For job status updates, refresh the page.
Workflows-Quickstart-Part1_Job-History_Screen_shot.png

Your submission is complete! What to expect

When your job completes successfully, you'll see a green checkmark in the Status column of the Job History page. This should only take a couple of minutes once the job starts running (see What happens when you launch a workflow for more details about things that can cause your job to remain in the submitted or queued stage).
Workflows-Quickstart_Job-status-DONE_Screen_shot.png

Once you see the green check, go back to the Data page

Your data table will include three new columns (analysis_ready_BAI, analysis_ready_BAM and CRAM_to_BAM_valdation_report).
S49c_Exercise_1_Data_Table_Screen_Shot.png

Where’s the (generated) data stored?Generated data from a workflow is stored by default in the Workspace bucket. You can check that the files are in the Workspace bucket by clicking on the “File” icon (bottom of the far left column) in the Data tab. Note that you will need to go down several file directories to get to the data files (NA12878.bam and NA12878.bai).

Follow-up (thought) questions

  • Answer: It now has additional columns that include links to the generated data in the workspace bucket. Because the columns are added to the input table, generated data is associated with input data automatically.  
  • Answer: The workflow generated the columns automatically because it was set up to write the generated metadata to the data table.

    Hint: select the workflow card in the Workflows tab and compare the “Outputs” attributes to the new columns in the sample table

    Outputs configuration
    Workflows-Quickstart-Part-1_Output-write-to-table_Screen_shot.png

    Sample table after completed run
    Workflows-Quickstart-Part-1_Generated-data-in-table_Screen_shot.png
  • Answer: The new columns include metadata links to the generated data. The actual data is stored in the workspace bucket, which you can access by clicking on the "Files" link from the Data page. Note that you will need to go down several directory levels to find the actual data files. 

    Workflows-Quickstart_Generated-data-files_Screen_shot.png
G0-smiley-icon.png Congratulations! You've completed Part 1 of the Workflows Quickstart!

 

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.