The full power of working in the cloud is realized in your ability to scale. Many users today are already taking advantage of Terra and Google Cloud to submit workflows large in size and in number. Whether you are new to working in the cloud or you’ve already built up some experience, here are some tips that can help you be successful in scaling on the Terra platform.
Resources and quotas
To ensure you can scale effectively, you will need to make sure you have the resources you need to run your workflows. If you discover that your workflow submissions are progressing very slowly or stalling for long periods of time, your billing project may have hit a Google resource quota. Your jobs may stall as they wait for resources to free up again. Working with limited resources can greatly increase your submission runtime, and while you are not billed for the time spent waiting for available resources, you may be interested in seeing your work progress more quickly.
You can get around this issue by filing quota increase requests for any quotas you hit. CPUs, In-use IP Addresses, Persistent Disks, and Local SSDs are among the most commonly hit resource quotas.
For more details about resource quotas and instructions on how to request an increase, please see our article CPUs and persistent disk quotas: What are they and how do you request more?.
Make the most of call caching
Time and money are two of the most valuable resources when it comes to scaling. Call caching gives you the power to save on both fronts. In essence, call caching allows you to re-use the results from previous successful runs. This means that you don’t have to re-run the same tasks from scratch every time; so long as the task inputs are the same as a previously successful run, Cromwell (our workflow engine) can automatically use those results again.
For more information on call caching, see our dedicated article Call caching: How it works and when to use it.
Craft your WDL for scale
To reduce the likelihood that Terra will hang trying to get you your results, you want to make sure your WDL is configured in a way that reduces its overall number of calls.
One way to reduce the number of calls is to avoid nested scatters, which lead to a lot of duplicated metadata. In cases where scatters are nested, Cromwell (our workflow execution engine) prints out the entire array worth of metadata as inputs for every index in the scatter. This can result in huge, unwieldy amounts of metadata generation, which may result in the Terra platform taking a long time to return your job results. It may even result in Terra not being able to serve it up at all, as the number of metadata rows exceeds the current platform threshold.
The Cromwell team is constantly looking for ways to improve our reliability in handling large submissions, so a metadata row limit today could be higher or gone tomorrow! If you do run into a situation where your metadata can't be served up in Terra Job Manager, please reach out to firstname.lastname@example.org for assistance.
Keep an eye on costs
As your workflows scale larger and larger, so will the associated costs. To avoid runaway costs we recommend taking advantage of some built-in Terra and GCP functionality.
Delete Intermediate Outputs
Terra allows you to automatically delete the output files generated by intermediate tasks at the end of your workflow run, saving only those final workflow outputs. This is a great way to save on storage costs, especially if you know you won’t need those intermediate task files in the future.
If you plan on re-running any workflows with call caching enabled, you will need to make sure Delete intermediate outputs is disabled for your initial run(s). Deleting the intermediate files will result in the task no longer being usable for call caching. As a result, you will need to generate the task results from scratch on your next run.
Read more about deleting intermediate outputs here.
GCP Budget Alerts
Google Cloud gives you the option of setting budgets for your projects. Setting a budget will send you email alerts when spending has reached certain thresholds, allowing you to keep tabs on how much of a bill you're racking up as you run your work in the cloud.
Read more about setting up GCP budget alerts here.