Terra (GCP) Quickstart 2: Workflows

Allie Cliffe
  • Updated

Get up and running workflows on Terra in less than half an hour. This is the second in a series of three Quickstarts that walk through a mock study of the correlation between height and grades for a cohort of 7th, 8th, and 9th graders. You will first run a preconfigured workflow, then set up and run the same workflow from a blank configuration card. As a bonus, you can run a follow-up third workflow to analyze data generated by the first exercises.


You should have already completed the Data Tables tutorial. You will work in your copy of the Quickstart workspace to get hands-on setting up and running workflows.

Workflows tutorial learning objectives

The Workflows Quickstart is intended to familiarize you with the process of setting up and running workflows on data stored or referenced in a workspace data table.  

After working through the exercises in the quickstart, you will know how to

  1. Set up and run a workflow to run on single entities in an entity table
  2. Generate and use a Workspace Data table for workspace-level resources
  3. Set up and run a workflow to run on a set of entities
  4. Run a workflow on output data from a previous analysis (bonus)

You will also understand

  1. How and why to write output data back to the input table
  2. How using sets can help streamline when you are running multiple workflows on the same subset of data

Three steps to complete the workflows quickstart


  1. Calculate the students' average GPA by running a pre-configured workflow on data in the student table.
  2. Calculate the students' average GPA by setting up and running a workflow from scratch on data in the student table.
  3. (optional bonus) Calculate the class average GPA by setting up and running a workflow on generated data from part 2.

Estimated time and cost to completeYou should be able to complete the Quickstart tutorial in about an hour. Running the tutorial will cost less than $0.25 (Google Cloud data storage and VM costs).

Additional requirements
You should already have completed Terra (GCP) Quickstart Part 1: Data tables tutorial in your own copy of the Quickstart workspace.

Workflows Quickstart step-by-step guide and video

Work in your own copy of the Quickstart workspace

Now that you've gotten familiar with the mock study data, you're ready to process it. Running a workflow to calculate each student's grade point average will help you get hands-on to learn about analyzing data with workflows!

Video walkthrough instructions

1: Run a preconfigured workflow on student data

In this exercise, the workflow has already been set up for you, so all you need to do is select students (data) and launch the workflow.

What you will learn

This exercise will give you a feel for the mechanics of running a workflow as well as how to monitor a workflow once you submit it. 

Step-by-step instructions to run your first workflow

1.1. Start by going to the Workflows page.

1.2. Select the 1_CalculateStudentGPA workflow (click the card). This will reveal the workflow configuration form where you'll set up the workflow to run on your data.

1.3. Confirm root entity type = "student". The root entity type is the table that contains the input data.

1.4. Click the "Select Data" button. This will take you to the Select Data form.

1.5. Select all students by clicking the box at the top of the first column.

1.6. Click the blue OK button to finalize your selection.

1.7. Click the run analysis button and launch the workflows. Terra will launch 86 workflow jobs in parallel (one for each student). 

1.8. Refresh the Job History page to monitor the submission status.

1.9. When the job is complete (you'll see a green checkmark in the Status column), go back to the Data page, click on the student table to open it and answer the following questions. 

Thought questions

  • The "root entity type" is the table that contains the input data.

    In this exercise, it is the student table (with the arrow pointing to it). The data the workflow will use are each student's GPAs for language arts, math, and science (circled in the screenshot below).


  • After running the workflow, there is an additional column in the student data table.

    The Cumulative_GPA column (circled in the screenshot below) stores the output data from running the workflow. 


    Where did the new column come from?

    The workflow was configured to write outputs back to the input data table.

    To see this, go back and look at the Outputs tab of the workflow configuration form (click the 1_CalculateStdentGPA card and the Outputs tab).


    Notice the name of the new column is the same as the attribute for the output variable GPA.

2: Set up and run the workflow from scratch

In this exercise, you will run the same workflow, but this time the configuration card is blank.

What you will learn

This walks you through setting up the workflow from scratch. You will need to add the input attributes for this workflow yourself, using exercise 1 for reference.

Step-by-step instructions to set up a workflow

In addition to choosing what students to analyze, there are two additional steps to configure a workflow to run on data in a table:

  1. Specify **input** values (i.e. what column in the data table corresponds to what variables in the WDL)
  2. Set outputs to be written back the data table

Step 1: Choose input data 

First, select the 2_CalculateStudentGPA workflow (click the card). You will need to fill in the attribute fields for all the required variables. 

2.1. Click the Select data button and select the Choose Existing Sets of Students radio button.

2.2. Choose the student-subset-8 (created in the Data Tables Quickstart).

Step 2: Specify input values

2.3. Go to the Input tab of the setup form.

2.4. Start by clicking into the first attribute field.

2.5. Select the appropriate attribute from the dropdown menu.

Hint: Use the variable name (second column) to help figure out what attribute to choose.

What does this formatting mean?The prefix this. tells Terra to look in the root entity table. The drop down includes all the columns (possible input data) from the root entity table.

2.6. Repeat for each variable with a blank attribute field except num_scores (you'll do this next!).

2.7. Click the blue Save button at the top right of the form. 

Step 3: Configure workspace-level variables

The third variable, num_scores, is a variable used across all input. In this case, it's the total number of courses the workflow averages over (it's the same value for all students - 3). Such workspace-level variables have a special table, the Workspace Data table (in the Other data section on the left hand side of the Data page). 

2.8. Start typing typing workspace. in the attribute field.

2.9. Select workspace.number-of-courses from the dropdown.

2.10. Click the Save button at the top to save your selection.

What does this formatting mean?The prefix workspace. tells Terra to look in the Workspace Data table. The dropdown includes all the columns from the table.

What to expect

  • Task Name




















Step 4: Write outputs back to the table

You can set up the workflow to write outputs back to the input table. In this case, outputs are a number that Terra will add in a new column in the input table. If your workflow generates large data files in workspace storage by default, this step will write the data file URI to the input table, making it much easier to associate outputs with inputs. 

2.11. Start in the Outputs tab of the setup form.

2.12. For the first output variable, “gpa”, go to the attribute field and type in "this." + a column name for your output files in the table.

Hint: Use something that is different than cumulative_GPA (from Exercise 1). 

2.13. Click the blue OK button to finalize your selection.

Step 5: Launch the workflow

2.14. Now that you have selected that data and set up the workflow inputs and outputs, you can click the run analysis button to launch the workflows.

Exercise 2 Thought Questions

  • This workflow accepts single entities as inputs - all the input data are found in a single row in the student table (the subject grades are all different variables corresponding to separate columns in the table).

  • Answer: The root entity table is defined as the table that contains the primary input for a workflow.

    In this exercise, the root entity type is student.

  • Answer: You specified the columns corresponding to each input variable in the workflow configuration form.

    In this exercise, each student's GPAs for language arts, math, and science) are data stored in the student table. 


  • Answer: You specify the table with the input data when you choose the root entity type (arrow).

    Then you tell the workflow which column contains the input data (or file) using the format this.YOUR-DATA-COLUMN-NAME in the Attributes field of the setup form (circled).


3. (optional) Run a follow up workflow on output data

In this exercise, you will use each sudent's cummulativeGPA (outputs from exercise 2) to calculate the class average.

What you will do

  1. Make a set of the students in seventh grade
  2. Configure the 3_CaluculateClassGPA workflow to take the 7th graders' GPAs as input and output a single average GPA to the student_set table.
  3. Run the 3_CalculateClassGPA workflow on the seventh grade set.

What you will learn

This exercise demonstrates how to set up and run a follow-up workflow on generated data.

What is differentThe 3_CalculateClassGPA workflow takes in multiple students' total GPAs (an array of data) and outputs a single value. Because of this nested table structure, the workflow setup is a little more complex. 

Step-by-step instructions to set up and run the follow-up workflow

Step 1: Create the set of seventh graders

3.1. Go to the student table and click the three-dot action menu at the top right of the Grade column to filter by grade.

3.2. Type 7 into the field with the magnifying glass icon (where it says "Exact match filter") and hit enter or return to filter.

3.3. Click the checkbox at the top left of the table to select all the students in seventh grade.

3.4. Click the Edit icon and select Save selection as a set from the dropdown. 

3.5. Name the set  7th-graders and Save

3.6. Select the 3_CalculateClassGPA workflow (click the card). 

3.7. Select the root entity type student_set

3.8. Click the Select Data button, and select the 7th-graders set you just created.

Step 2: Configure the workflow

3.8. Take a look at the configuration pane (filled in) to answer the following questions. 

  • This workflow accepts an array as input (the Cumulative_GPA for each student in the class) and outputs a single value for the class.


  • Answer: The student_set table. The array of students in the students column is the smallest piece of input data.


  • The formatting of the subject-scores variable attribute demonstrates how to tell Terra where the primary data is when the tables are nested like this.

    Task name Variable Type Attribute
    CalculateStudentGPA subject_scores Array[Float] this.students.Cumulative_GPA

    Breaking down the attribute formatting

    Each part of the attribute string gives Terra instructions on where to find the data.

    this. students. Cumulative_GPA
    Look in the root entity table Get the id's from the student column  Go to this column in the student table for the input

    Notice the "s" at the end of student!This is a Terra formatting quirk that you will need to remember. The dropdown only offers columns from the **root entity table**. If your tables are nested, as in this case, you will need to type in the full attribute string correctly!

Step 3: Run the workflow.

3.9. Click the Launch workflow button.

Thought questions:

  • Answer: Although you will be using data from all 29 students in the seventh grade, it is a single workflow, with a single output value.


  • Answer: There is one output value (the class average GPA) for the entire set. For this exercise, the workflow is configured to write to the student_set table.

    What to expect

    Terra will add a column to the table when the workflow is complete. You can find the name of the output if you look at the Outputs in the workflow configuration form. Your student_set table will include the example row below. 

    student_set_id students class_gpa
    7th-graders AV612, BM445, BY969... (29 entities) 1.234

Takeaways and next steps (plot results in a notebook)

After completing the Quickstart, you should know/understand

  • How to set up and run a workflow on single entities of data
  • How and why to write output files to the input table
  • How to set up and run a follow-up workflow on a set of data

Next: Quickstart part 3: Plot the results in a notebook

Learn how to set up and run an interactive analysis to visualize data in the Notebooks Quickstart.

Bonus! Along the way, you will answer the question "How does a student's height correlate with their GPA?"

Was this article helpful?

0 out of 0 found this helpful



Please sign in to leave a comment.