Set workflow cost thresholds

Adam Mullen

What we’re solving

This feature sets a cost threshold for a workflow to stop unexpected costs from occurring.

Users accidentally overrunning their budget is a common concern when using the cloud or Terra, especially with workflows, which can accumulate costs quickly. This is especially important for grants with a set budget or any project that wishes to closely manage its cloud costs and be confident that an outlier workflow isn’t going to overrun its budget in a day.

What's changing for you

On the configuration page for a workflow, a “Set cost threshold per workflow (BETA)” option will be available. Setting an amount in this field will cause workflows to terminate once they exceed that amount.

Cost thresholds are configured on a per workflow basis. If you are running multiple workflows within a single submission, you will see this reflected in the approximate maximum submission cost before launching the submission.

Frequently Asked Questions

How can I trust this feature will actually stop runaway workflow costs?

The cost threshold is based on cost estimations by Cromwell in the background as your workflow is running. When your workflow’s cost estimation becomes greater than or equal to the set threshold you’ve defined, the workflow is ordered to be terminated.

We have been testing the accuracy of our cost estimations for the past three months with a sample size of 17,797 workflows run, sourced from 68 different workspaces with 127 unique types of workflows. We then compared the estimated costs to the actual costs of the workflows reported by Google.

Our analysis suggests that our cost estimations provide highly reliable forecasts, with minimal bias and a tight clustering around actual costs. While a small number of workflows are off by a larger margin, these cases represent a tiny fraction of the total runs. We are actively investigating these outliers to improve our estimated costs. We are confident that our estimations are well-calibrated and effective for a majority of your use cases!

The graph above shows a comparison of the actual and estimated costs for the 17,797 completed workflows included in our testing data. The red dotted line represents the line of best fit, in which the estimated costs are equal to the actual costs. Towards the bottom-left, a heavy concentration of data points are close to the line, reflecting the reliability and accuracy of our estimations. There are a few outliers of note as the costs increase in magnitude — this is something we are working on improving and will have updates on soon!

 

Summary Statistics

  • Mean Difference = -$0.08
    • Averaged across all workflows, the estimates are 8 cents lower than the actual costs. 
  • Median Difference = $0.00
    • Half of the workflows are overestimated and half are underestimated — the midpoint of all estimates perfectly align with the actual cost, with no signs of any systemic skew.
  • % Overestimation = 0.30%
    • Only 0.30% of workflows end up more than one standard deviation from the actual cost — this is a remarkably small fraction of outliers.
  • % Underestimation = 0.22%
    • Conversely, only 0.22% of workflows are less than one standard deviation from the actual cost.
  • % Accurate (within 1 standard deviation) = 99.48%
    • Nearly all workflows included in our testing have estimates within one standard deviation of the true cost — showing a high level of accuracy.
  • Correlation between estimated and actual costs = 0.97
    • A correlation of 0.97 between estimated and actual costs indicates a strong linear relationship. As actual costs increase or decrease, the estimation tool’s predictions scale similarly.

Can I use this feature for budgeting?

We recommend this feature as a safety net to prevent unexpected costs rather than a strict budgeting tool. Although our findings show high accuracy and reliability in our cost estimations, workflows may not terminate immediately upon hitting the defined threshold.

Based on our findings, we recommend planning for a margin of error of about +/- 20% when setting workflow cost thresholds. However, you should consider the size and shape of your workflows when setting these thresholds. Some examples:

  • If the last task of your workflow is the most complex part of your workflow and uses a very expensive machine type, you should plan for a larger margin of error.
  • The accuracy of our model becomes more unpredictable at higher workflow costs (hundreds of dollars). If you are running very expensive workflows, you should plan for a larger margin of error.

What are the limitations of this feature?

In some edge cases, our model showed significant overestimations and would terminate workflows early, which we have excluded from these findings. We are opening this feature to public preview with known limitations because we believe it will greatly improve your ability to manage workflow costs. If you see something that doesn't make sense, please reach out to us!

Here are some other important cost threshold considerations:

  1. Costs are in USD.
  2. Costs are VM and disk costs only. Bucket storage and egress costs are not included.
  3. Based on GCP list prices. Discounts are not included.
  4. GPU costs are not included.
  5. Workflow costs vary by input. Set a threshold that considers variability.

Try it out

Go to https://app.terra.bio/#feature-preview to enable the “Workflow Cost Thresholds” feature and let us know what you think!

Comments

3 comments

  • Comment author
    Bronwyn MacInnis

    Hooray cost controls!!! Thank you!!!!

    0
  • Comment author
    Andrey Fedorov
    • Edited

    Incredibly important feature! Can't wait to see it released. Thank you!

    0
  • Comment author
    Adam Mullen

    Hi everyone! I'm excited to announce that this feature is now in public preview and available to anyone who wishes to try it out. This roadmap article has been updated with more details about how the feature works, our findings from private preview, and additional considerations and limitations for how to use the feature. Please go to https://app.terra.bio/#feature-preview to enable the “Workflow Cost Thresholds” feature and let us know what you think!

    0

Please sign in to leave a comment.