Workflow setup: VM and other options

Allie Hajian
  • Updated

This article outlines workflow runtime options (including cost saving options) - what they are and how to specify them. If this is your first time running a workflow, the default runtime options are usually adequate. 

Workflow options overview

Terra offers several ways to adjust how your workflow runs. You can configure all of the runtime options described below in the workflow submission form. 

Screenshot showing an example workflow submission form, highlighting the configuration options available on this form. An orange rectangle and the number '1' highlight the first section of this form, where you can select your workflow's version and see its source and synopsis. Another orange rectangle and the number '2' highlight the second section, which contains checkboxes that you can use to select whether to use call caching, delete intermediate outputs, use reference disks, retry with more memory, and ignore empty outputs.

The configuration form displays default values provided by the workflow author. 

Running your first workflow? Use the defaults!If you are just getting familiar with running a workflow, you can always use the default runtime options. These are set up to make it easiest and to save money for most users. 

1. Workflow information (snapshot, source and synopsis)

In this first section of the workflow configuration form, you can use the snapshot drop-down menu to see all available versions of your workflow. You can choose to use the most up-to-date version or a previous version (if you need to maintain consistency, for example). Terra will automatically run the version you choose.

This section also lists the workflow tools repository (source) and a synopsis (if available). 

2. Money-saving options

There are several features in Terra designed to help save money when running a workflow. 

2.1. Cost threshold

You can set a limit on spend when running a workflow. See important considerations below.

  1. Costs are in USD.
  2. Costs are VM and disk costs only. Bucket storage and egress costs are not included.
  3. Based on GCP list prices. Discounts are not included.
  4. GPU costs are not included.
  5. Workflows may not terminate immediately upon hitting threshold, plan for a margin of error.
  6. Workflow costs vary by input. Set a threshold that considers variability.

2.2. Call caching

Call caching allows Terra's execution engine (Cromwell) to detect when a job has been run in the past so that it doesn't have to re-compute results. The call caching feature in Terra can save you time and money when you are repeating all or parts of a workflow analysis. 

2.3. Delete intermediate outputs

Deleting intermediate outputs allows you to save storage costs by automatically deleting outputs from intermediate steps when the workflow successfully completes. This feature is most useful when these intermediate outputs are not used in a downstream analysis.

Note that complex workflows can have a large number of intermediate outputs, which can dramatically increase the storage costs of a project. For example, intermediate files made up roughly 85% of the storage costs for a recent large-scale project, even though no one ever used these files.

Call caching and deleting intermediate outputs cannot be combined.These two options save storage costs in two different ways. 

To learn more about call caching and when to use it, see this article.

To learn how to save storage costs by deleting intermediate inputs, see this article

2.4. Use reference disks

If your workflow uses human or mouse genome reference files (e.g., HG 19 or MM 10), Terra can automatically attach a disk containing HG 19/HG 38 references to your Google Virtual Machine. If the checkbox labeled ‘Use Reference Disks’ is selected, the execution engine will examine the job inputs to see if any of them correspond to reference inputs available on a reference disk image. This saves time and compute resources that your workflow would otherwise spend localizing large reference inputs. 

For more details, including the full reference disk manifests, see Reference Disks in Terra

2.5. Retry with more memory

If a task is failing because your VM is running out of memory, Terra will automatically retry it with more memory if this option is selected and maxRetries is greater than 0 in your WDL script.

For more details, see the Out of Memory Retry documentation. 

2.6. Ignore empty outputs

If your workflow outputs a null or empty value to a data table, selecting this option will prevent Terra from creating a new column to store that empty output. This can prevent your tables from becoming too large and sparse, and therefore makes it easier to find the interesting data within your tables.

2.7. Enable resource monitoring

Specify user-provided tools to monitor task resources. For more details, see Monitoring GCP cloud resources used in a workflow.

Video and tutorial workflow resources 

Data tables resources

Workflows resources

Hands-on practice setting up and running a workflow analysis (Note: To run these practice exercises you will need to clone the linked workspace to your own billing project)
To practice setting up and running workflows, work through the Terra-
Workflows-QuickStart
 workspace. It should take about half an hour to complete the
hands-on tutorial and cost less than a dime (GCP costs).

Was this article helpful?

1 out of 2 found this helpful

Comments

0 comments

Please sign in to leave a comment.