Workflow setup: VM and other options

This article outlines workflow runtime options (including cost saving options) - what they are and how to specify them. If this is your first time running a workflow, the default runtime options are usually adequate.

Workflow options overview

Terra offers several ways to adjust how your workflow runs. You can configure all of the runtime options described below in the workflow submission form.

The configuration form displays default values provided by the workflow author.

Running your first workflow? Use the defaults!If you are just getting familiar with running a workflow, you can always use the default runtime options. These are set up to make it easiest and to save money for most users.

1. Workflow information (snapshot, source and synopsis)

In this first section of the workflow configuration form, you can use the snapshot drop-down menu to see all available versions of your workflow. You can choose to use the most up-to-date version or a previous version (if you need to maintain consistency, for example). Terra will automatically run the version you choose.

This section also lists the workflow tools repository (source) and a synopsis (if available).

2. Money-saving options

There are several features in Terra designed to help save money when running a workflow.

2.1. Cost threshold

You can set a limit on spend when running a workflow. See important considerations below.

Costs are in USD.
Costs are VM and disk costs only. Bucket storage and egress costs are not included.
Based on GCP list prices. Discounts are not included.
GPU costs are not included.
Workflows may not terminate immediately upon hitting threshold, plan for a margin of error.
Workflow costs vary by input. Set a threshold that considers variability.

2.2. Call caching

Call caching allows Terra's execution engine (Cromwell) to detect when a job has been run in the past so that it doesn't have to re-compute results. The call caching feature in Terra can save you time and money when you are repeating all or parts of a workflow analysis.

2.3. Delete intermediate outputs

Deleting intermediate outputs allows you to save storage costs by automatically deleting outputs from intermediate steps when the workflow successfully completes. This feature is most useful when these intermediate outputs are not used in a downstream analysis.

Note that complex workflows can have a large number of intermediate outputs, which can dramatically increase the storage costs of a project. For example, intermediate files made up roughly 85% of the storage costs for a recent large-scale project, even though no one ever used these files.

Call caching and deleting intermediate outputs cannot be combined.These two options save storage costs in two different ways.

To learn more about call caching and when to use it, see this article.

To learn how to save storage costs by deleting intermediate inputs, see this article.

2.4. Use reference disks

If your workflow uses human or mouse genome reference files (e.g., HG 19 or MM 10), Terra can automatically attach a disk containing HG 19/HG 38 references to your Google Virtual Machine. If the checkbox labeled ‘Use Reference Disks’ is selected, the execution engine will examine the job inputs to see if any of them correspond to reference inputs available on a reference disk image. This saves time and compute resources that your workflow would otherwise spend localizing large reference inputs.

For more details, including the full reference disk manifests, see Reference Disks in Terra.

2.5. Retry with more memory

If a task is failing because your VM is running out of memory, Terra will automatically retry it with more memory if this option is selected and maxRetries is greater than 0 in your WDL script.

For more details, see the Out of Memory Retry documentation.

2.6. Ignore empty outputs

If your workflow outputs a null or empty value to a data table, selecting this option will prevent Terra from creating a new column to store that empty output. This can prevent your tables from becoming too large and sparse, and therefore makes it easier to find the interesting data within your tables.

2.7. Enable resource monitoring

Specify user-provided tools to monitor task resources. For more details, see Monitoring GCP cloud resources used in a workflow.

Video and tutorial workflow resources

Data tables resources

To learn more about using data tables to organize your data and scale your
analysis, see Managing data with workspace tables.
To understand how to adjust data tables, see How to modify and edit data tables.
For hands-on practice with data tables, try the Data Tables QuickStart.

Workflows resources

To learn more about how to update workflows to the latest version, see Updating workflows to the latest version.
To see a video tutorial on configuring a workflow, see this video walkthrough of the Workflows Quickstart - Part II.
To learn about best practices to run your workflow at scale, see the WARP pipelines team's guidelines for cost optimization.

Hands-on practice setting up and running a workflow analysis (Note: To run these practice exercises you will need to clone the linked workspace to your own billing project)
To practice setting up and running workflows, work through the Terra-
Workflows-QuickStart workspace. It should take about half an hour to complete the
hands-on tutorial and cost less than a dime (GCP costs).