Get up and running workflows on Terra on Azure in less than half an hour. This is the second in a series of three Quickstarts that walk through a mock study of the correlation between height and grades for a cohort of 7th, 8th, and 9th graders. In the workflows tutorial, you will import a workflow from GitHub and set up and run it in your Quickstart workspace.
Workflows tutorial details
The Workflows Quickstart is intended to help you become familiar with setting up and running workflows in Terra on Azure. To make it a little more real, it walks through a mock study that seeks to answer the question, "Does a student's height affect their grades?"
Prerequisite
Before starting the Workflows Quickstart, you should complete the Data Tables Quickstart, which introduces the mock study data and analysis tools.
After working through the exercises in the workflows quickstart, you will know
- How to launch the workflows service in your workspace
- Run a pre-configured workflow
- Find and import a WDLs workflow
- Set up and run a workflow from scratch
Workflows quickstart flow
Estimated time and cost to completeYou should be able to complete the Quickstart tutorial in half an hour. Running the tutorial will cost less than $0.25 (Azure Cloud data storage and VM costs). However, there are additional infrastructure costs (see below).
Additional costs and requirements
You will need to have an Azure-subscription-backed Terra Billing project and your own copy of the Quickstart workspace to complete the tutorial.
Making a workspace will incur additional infrastructure costs (typically ~$5/day). See Overview: Costs and Billing Azure) for more details.
Step 1: Start the workflows app service in your workspace
Workflows are orchestrated in Terra by the workflows application - Cromwell. When you are in a new workspace, you will need to launch Cromwell.
Cost caveats when launching the workflows serviceLaunching Cromwell spins up additional workspace infrastructure that comes with a cost (because it makes your workspace "bigger" in terms of the resources consumed). This is in addition to the working costs associated with running a workflow. Currently, once you spin up the workflows service engine, you cannot delete it.
1.1. Go to the Workflows page of your copy of the Intro to Terra Quickstart workspace and click the blue Launch Workflows App button.
It may take a few minutes to complete.
When workflows are ready
After a few minutes, you will see the workflow navigation menu on the left.
Step 2: Run a preconfigured workflow
This exercise will give you a feel for the mechanics of running a workflow as well as how to monitor a workflow once you submit it.
2.1. Click the blue configure button in the calculate_avg_gpa workflow. This will reveal the workflow configuration form where you'll set up the workflow to run on your data.
2.2. Verify that the student table is selected from the dropdown under Select a data table.
2.3. In the Select Data tab (left side), select the first eight students by clicking the boxes on the left.
2.4. Select Submit to open a popup window where you can name and enter comments about the submission.
Your submission has a pre-populated name that includes the workflow name, input data table, and date and time of submission. You can change this to be meaningful to you.
The popup includes how many workflows will be submitted in this submission.
2.5. To confirm and launch the workflow submission, click the Submit button again.
What to expect
Once you submit your workflow, you'll see the submission details on the lefthand side of the Workflows page. Details include the workflow name, status, submission date, and duration.
To see the status of a workflow, its start and end time, and sub-workflow and task failures, click on an individual workflow ID to view the workflow details page.
Use the breadcrumb on top of the page (circled in the screenshot below) to navigate back and forth between the submission history (lists of previous submissions), submission details page, and workflow details page.
What to expect (completed workflows)
When the workflows are done, you'll see a green check in the Submission Details.
Thought questions
-
After running the workflow, there is an additional column in the student data table.
The gpa column (circled in the screenshot below) stores the output data from running the workflow.
-
The workflow was configured to write outputs back to the input data table.
To see this, go back and look at the Outputs tab of the workflow configuration form (click the 1_CalculateStdentGPA card and the Outputs tab).
Notice the name of the new column is the same as the attribute for the output variable
GPA
. -
Terra stores generated data files in workspace blob storage and includes links to the files in the input data table by default. The helps associate generated data with inputs, which is especially useful as the file directory structure includes many non-human-friendly names.
How to access generated data files
You can verify that the generated output files are in your workspace blob storage container by clicking on the Files icon in the right sidebar. This will open the directory of your workspace storage.
Click in the left-hand column to open the subdirectories workspace services > cbas > submission-ID > CalculateStudentGPA > task ID > call-CalculateAverage > tes_task. Click on tes_task on the left to open the contents of that directory in the main page and you will see a list of all the generated files.
Note that you may need to go down several levels in the file directory to find the data files. That's why it's a good idea to write output data back to the input data table.
Step 3: Import workflow from GitHub
You don't have to be a coding or WDLs expert to run a workflow analysis in Terra. There are hundreds of published workflows available on GitHub and Dockstore that you can use right in your workspace.
What you'll learn in step 3
- How to import a workflow from GitHub
- How to set up inputs and outputs to run an unconfigured workflow
3.1. In the Workflows tab, click Find and add workflows in the left column to expand the menu.
3.2. Select the Import a Workflow option.
3.3. Fill in the blank fields and then the blue add to workspace button (the button will be disabled if every field is not filled in). Notice that Terra automatically assigns the workflow name from GitHub, but you can override it.
Form field values
-
Workflow
https://github.com/broadinstitute/DSP_User_Ed/blob/master/calculate_average_gpa.wdl
-
workflow name
2_calulate_average_gpa
What to expect
In a few seconds, you should get a popup with the message Success! 2_calculate_average_gpa has been added to your workspace.
Step 4: Set up and run the workflow from scratch
In this exercise you'll practice setting up inputs and outputs to run an unconfigured workflow.
The workflow you imported from GitHub is "unconfigured" - meaning you will need to set up the inputs and outputs, as well as choose the data to run on. Luckily in Terra, you only need to do this once! You'll see how easy it is to scale a workflow to run on as much data as you need with just a few additional clicks.
Step-by-step instructions to set up and run your workflow from scratch
4.1. Click the blue start configuring now button in the popup. This will reveal the workflow configuration form where you'll set up the workflow to run on your data.
4.2. Under Select a data table, select the student table from the dropdown.
4.3. In the Select Data tab (left side), select all 86 students by clicking the box at the top left.
4.4. Go to the Inputs tab and fill in each variable's attribute.
Attribute hints
- For each subject score, use Fetch from Data Table as the input source and choose from the dropdown.
- For num_score, use Type a Value for the input source and type in 3 in the attribute field.
When you're done, your inputs should look like the screenshot below.
4.5. In the Outputs tab, you should see the attribute gpa (autofilled).
4.6. To keep the outputs from this run separate, replace gpa (the default) with total_gpa.
4.6. When ready, select Submit to open a popup window where you can name and enter comments about the submission.
Your submission has a pre-populated name that includes the workflow name, input data table, and date and time of submission. You can change this to be meaningful to you.
The popup includes how many workflows will be submitted in this submission.
4.7. To confirm and launch the workflow submission, click the Submit button again.
What to expect
If you've set up your workflow correctly, it should run just like in exercise 2.
Takeaway and next steps (visualize the data)
After completing the Workflows Quickstart, you should know/understand
- How to set up and run a workflow to process data in a table
- How to import a workflow from GitHub.
Quickstart part 3: Plot the results in a notebook
In the last step of the Quickstart mock study, you'll plot the height versus average GPA results for the 8-student cohort and the whole dataset. You'll learn how to set up and run an interactive JupyterLab analysis to visualize data and the importance of including sufficient data in your study.