How to set up and run a workflow

Allie Cliffe

If you're interested in using Terra on Azure, please email terra-enterprise@broadinstitute.org.

This article walks through the setup process for running a workflow in Terra on Azure when using inputs (data samples) from a data table. Please note that this document only includes functionality currently available.

Overview: Workflows (current functionality)

Currently, you can run a workflow by creating a workspace and importing a workflow from a public GitHub, Dockstore, or a list of curated workflows. Enterprise customers can import workflows from a private GitHub repository

Step 1: Create or clone a workspace

1.1. Go to Your Workspaces (select from the main navigation menu at the top left of any page).

1.2. Create a workspace by clicking on the Create Workspace button at the top of the page. Or you can clone a workspace by clicking on the three-dot action icon at the top right and selecting clone. 

What to expect

Terra will automatically launch the cloud infrastructure to power data tables in a newly created workspace. 

When data tables are ready

Once data tables are launched, you’ll see the active import data button in the top left section of the Data page.

Step 2: Launch Cromwell (the workflows application)

Workflows are orchestrated in Terra by the workflows application - Cromwell. When you are in a new workspace, you will need to launch Cromwell

2.1. Go to the Workflows page and click the blue Launch Workflows App button. 

ToA_Launch-workflows-app_Launch-app-popup_Screenshot.png

It may take 10-15 minutes to launch. 

ToA_Launch-workflows-app_Launching-popup_Screenshot.png

When workflows are ready

After a few minutes, you will see the workflows menu on the left-hand side and any workflows in the original workspace in the center under Workflows in this workspace.

ToA_Workflows-in-this-workspace_Screenshot.png

It may take several minutes to requisition and set up the cloud infrastructureBoth data tables and Cromwell must be ready before you can move on to the next step. See Data tables: Additional resources for more details about the Workspace Data Services that power data tables.

Step 3: Upload data table (optional)

Terra workflows are set up to pull inputs (URIs for data files in open-access Azure blob storage containers) from the data table. If you cloned a workspace, you should see the table as soon as your Workspace Data Services infrastructure loads.

If you don't have a table for your workflow inputs, you will first need to generate the input data table by uploading a TSV. You can create a TSV from scratch in a spreadsheet, or download one from an existing Terra on Azure workspace, following the directions below. 

  • 3.1. Click on the three-dot action icon beside the table you want to add to your own workspace and select Download TSV

    ToA_Download-sample-TSV_Screenshot.png

    3.3. Click the save button to download the TSV to local storage. 

    ToA-Download-sample.tsv_Screenshot.png

    3.4. Navigate back to the Data page of your own workspace

    3.5. Click the Import Data button (left side near the top) and select the Upload TSV option to create and populate the data table.

    ToA-Covid-workspace_Import-data-Upload-TSV_Screenshot.png

    3.6. In the Import Table Data popup, fill in the table name and select the TSV you just downloaded.  

    ToA-Covid-19-workspace_Import-sample-table-popup_Screenshot.png

    3.7. Click the Start Import Job button. 

What to expect

You should see your data is now visible in the tables section of the Data page

ToA-Covid-19-workspace_Sample-table_Screenshot.png

Step 4: Add workflow to the workspace (optional)

Note that workflows will be cloned if you copy a workspaces created after December 1, 2023. 

4.1. In the Workflows tab, click Find and add workflows in the left column to expand the menu.

ToA_Workflows-in-this-workspace_Find-and-add-workflows_Screenshot.png

4.2. You can choose to browse selected featured workflows or import from GitHub or Dockstore

  • 4.3. Click on Featured workflows to select from five commonly used, standard workflows.

    ToA_Add-featured-workflow_Screenshot.png

    4.4. Click the Add to workflow button at the right.

  • 4.2. Select the Import a Workflow option in the left column.

    ToA_Import-a-workflow-from-github_Screenshot.png

    4.3.Fill in the GitHub link and workflow name in the blank fields and click the Add to Workspace button (the button will be disabled if every field is not filled in).

    ToA_Find-a-workflow_Add-a-workflow-link_Screenshot.png

  • 4.3. Click the Dockstore option in the left column. 

    ToA-Workflows_Workflows-in-this-workspace_Dockstore-option__Screenshot.png

    4.4. Search and filter to find your workflow in Dockstore

    ToA_Import-workflows_Dockstore_Screenshot.png

    4.5. Click on the workflow name and choose Terra under Launch with on the right side.  

    ToA-Find-workflow_Launch-with-Terra-from-Dockstore_Screenshot.png

    4.6. Give the workflow a name and choose the destination workspace. Then click the blue Import button. 

    ToA-Find-workflow_Import-from-Dockstore-popup_Screenshot.png

What to expect

When you go to the Workflows tab, you will see the new workflow(s) in the list. 

ToA_Workflows-in-workspace_After-imports_Screenshot.png

Step 5: Select data and set up the workflow

Click on the blue configure button in the workflow card to access the submission configuration form (screenshot below).

ToA_Submission-configuration-pane_Screenshot.png

What to expect

The configuration form includes useful information like the workflow version and source URL link. It is also where you will set up the workflow to run on specific data from the input table. 

Select the input data table

5.1. Choose the table with the input data files from the dropdown.

Select data rows to run on

5.3. Navigate to Select Data at the bottom of the form to see the data table.

5.3. Select the rows to analyze by clicking the checkbox at the left of the row.

ToA_Select-data-tab-in-workflow-submission-configuration-pane_Screenshot.png

Specify input variable attributes

Note that if you cloned a workspace where the workflow had been run, the inputs and outputs will be pre-configured. 

5.4. Go to the Inputs tab to specify data table columns (attributes) for each variable.

5.5. Choose Fetch from Data Table as the Input source and select each variable's attribute from the dropdown.  ToA_Configure-inputs-in-configuration-pane_Screenshot.png

Input source options

  • Type value manually
  • Fetch from data table

Type manually (workspace-wide variables)

You will type a path manually for all workspace-level variables, such as reference files, interval files, or fixed numbers for variables such as disk size.

Choosing data tables inputs

If you choose to Fetch from the table, the attribute column dropdown will display all columns in the data table. You can use the variable name to choose the right table column, or click on the "autofill" link at the top of the configuration form. 

ToA-workflows_Screenshot-of-autofill-from-data-table-option-for-input-attributes.png 

Write outputs to the data table

5.5. Click Outputs to configure the workflow to write a new column to the data table for each output variable.

5.6. You can enter a new name in the attribute column to make a new column or select an existing column (not that choosing an existing column will overwrite any data in the existing column). 

ToA_Configure-outputs-in-submission-pane_Screenshot.png

Where is generated data stored?

Generated files will be stored in the workspace cloud storage by default. Terra will write the file locations (URIs) of generated files in a new column in the input data table.

Original data table

ToA-workflows_Screenshot-of-input-data-table-before-running-the-workflow.png

Data table after running workflow

ToA-workflow_Screenshot-of-input-data-table-after-running-workflow.png

Step 6: Submit the workflow

6.1. When ready, select Submit to open a popup window where you can name and enter comments about the submission.

ToA_Send-workflow-submision-form_Screenshot.png

Your submission has a pre-populated name that includes the workflow name, input data table, and date and time of submission. You can change this to be meaningful to you.

The popup includes how many workflows will be submitted in this submission.

6.2. To confirm and launch the workflow submission, click the Submit button again.

What to expect

Once you submit, Terra will get to work setting up and deploying the cloud resources to run your workflow. You will automatically be directed to the submission details page.

Next steps: Monitor workflow submission status

Once you submit your workflow, you can find the submission history by clicking on the lefthand side of the Workflows page. Details include the workflow name, submission date, and duration.

ToA_Monitor-workflow_Screenshot.png

To see the status of a workflow, its start and end time, and sub-workflow and task failures, click on an individual workflow ID to view the workflow details page.

Note that the submission history includes every workflow submitted in the workspace. These cannot be deleted. However, you can filter by failed or successful workflows (see the dropdown in the top right). 

ToA-workflows_Screenshot=of-submision-history-highlighting-filtering-options.png

Use the breadcrumb on top of the page (circled in the screenshot below) to navigate back and forth between the submission history (lists of previous submissions), submission details page, and workflow details page.

ToA_Workflow-Details-in-Cromwell_Screenshot.png

What to expect (completed workflows)

When you see a green check in the Job History, you can verify that the generated output files are in your workspace blob storage container by clicking on the Files icon in the right sidebar. This will open the directory of your workspace storage. 

To access the generated data files

Click in the left-hand column to open the subdirectories cromwell-executions > workflow-name > submission-ID > taskname > execution > output file. You will see a list of all the generated files. 

ToA_Generated-files-in-workspace-storage_Screenshot.png

Note that you may need to go down several levels in the file directory to find the data files. 

Was this article helpful?

0 out of 0 found this helpful

Comments

2 comments

  • Comment author
    Curtis Kapsak

    FYI I ran into a broken link in this guide.

    For Step 2.1, when I try to click the link to download the sample TSV file, I'm met with an authentication error and no TSV file is downloaded.

    Error:

    When I navigate to the featured workspace to try to download the TSV, it looks like the data table service is not up and running:

    Perhaps this is why I'm unable to download the sample TSV file?

    0
  • Comment author
    Allie Cliffe

    I'm so sorry you couldn't download the TSV Curtis Kapsak! The data table status in the Featured Workspace doesn't affect whether you can access the file. The table status error you saw was because tables were single-user and only viewable to workspace creators until last week. 

    But I'm glad you commented since you revealed a problem with our instructions (and I have updated them, so they should be current). You should be able to see the sample table in the Featured Workspace now. If you click on the three-dot action icon to the right of the sample table, you can download the sample.tsv to your local machine, then upload it to your own workspace copy. 

    0

Please sign in to leave a comment.