The full power of working in the cloud is the ability to scale. Many researchers are already taking advantage of Terra and Google Cloud to submit large numbers of large, resource-intensive workflows. Whether you're new to working in the cloud or have already built up some experience, here are some tips to help you scale your analysis successfully on the Terra platform.
Resources and quotas
To scale effectively, you'll need to make sure you have the resources you need to run your workflows. If your workflow submissions are progressing very slowly or stalling for long periods of time, your billing project may have hit a Google resource quota. Jobs may stall as they wait for resources to free up again. Working with limited resources can greatly increase your submission runtime. And while you are not billed for the time spent waiting for available resources, you may be interested in seeing your work progress more quickly.
You can get around this issue by filing a quota increase request for any quotas you hit. CPUs, In-use IP Addresses, Persistent Disks, and Local SSDs are among the most commonly hit resource quotas.
For more details about resource quotas and instructions on how to request an increase, see CPUs and persistent disk quotas: What are they and how do you request more?.
Make the most of call caching
Time and money are two of the most valuable resources when it comes to scaling. Call caching gives you the power to save on both fronts by letting you reuse the results from previous successful runs. This means that you don’t have to re-run the same tasks from scratch every time. As long as the task inputs are the same as a previously successful run, Cromwell (our workflow engine) can automatically use those results again.
For more information on call caching, see Call caching: How it works and when to use it.
Craft your WDL for scale
To reduce the likelihood that Terra will hang trying to get your results, make sure your WDL is configured to reduce its overall number of calls.
One way to reduce the number of calls is to avoid nested scatters, which lead to a lot of duplicated metadata. In cases where scatters are nested, Cromwell (our workflow execution engine) prints out the entire array of metadata as inputs for every index in the scatter. This can result in generating huge, unwieldy amounts of metadata, which can result in long wait times to return job results. It may even prevent Terra from being able to serve it up at all, as the number of metadata rows exceeds the current platform threshold.
The Cromwell team is constantly looking for ways to improve our reliability in handling large submissions, so a metadata row limit today could be higher - or gone - tomorrow! If you do run into a situation where your metadata can't be served up in Terra Job Manager, please reach out to firstname.lastname@example.org for assistance.
Tools to help keep an eye on costs
As your workflows scale larger and larger, so will the associated costs. To avoid runaway costs, we recommend taking advantage of built-in Terra and GCP functionality (see below).
Delete Intermediate Outputs
Terra allows you to automatically delete the output files generated by intermediate tasks at the end of your workflow run, saving only those final workflow outputs. This is a great way to save on storage costs, especially if you know you won’t need those intermediate task files in the future.
Read more in Saving storage costs by deleting intermediate files.
Call caching allows Terra's execution engine (aka Cromwell) to detect when a job has been run in the past so that it doesn't have to re-compute results. The call caching feature in Terra can save you time and money when you are repeating all or parts of a workflow analysis.
Read more in Call caching: How it works and when to use it.
You cannot use BOTH intermediate files and call cachingIf you plan on re-running any workflows with call caching enabled, you will need to make sure Delete intermediate outputs is disabled for your initial run(s). Deleting the intermediate files will result in the task no longer being usable for call caching. As a result, you will need to generate the task results from scratch on your next run.
GCP Budget Alerts
Google Cloud gives you the option of setting budgets for your projects. Setting a budget will send you email alerts when spending has reached certain thresholds, allowing you to keep tabs on how much of a bill you're racking up as you run your work in the cloud.
For more details, see How to set up and use GCP budget alerts.