Learn how call caching can save you time and money when you are repeating all or parts of a workflow analysis in Terra. Call caching allows Terra's execution engine (aka Cromwell) to detect when a job has been run in the past so that it doesn't have to re-compute results.
Call caching overview
Call caching allows you to "rerun" or restart a workflow without redoing calculations. This can be a huge advantage anytime you don’t want to pay to run part or all of the same workflow on the same data twice.
Examples of when call caching is useful
- Before running downstream analysis on the same data
- When you are testing or troubleshooting a partially failed workflow or are otherwise not sure if a workflow will complete. Call caching lets you start the workflow again at the beginning of the task that failed, rather than rerunning the entire workflow from the beginning.
- If you are the workspace creator, and ran the workflow that generated the cached calls.
Examples of when you may not benefit from call caching
- If you did not run the workflow that generated cached results
- If your workflow relies on interacting with the world outside of Cromwell - if it pulls live data from an external website, for example.
- If you are benchmarking a workflow to determine time and cost, using cached results will provide inaccurate values.
Who can use call caching in a shared workspaceNote that currently, only workspace creators repeating their own workflow runs can benefit from call caching. We are working on enabling users to run workflows in shared workspaces. Once that is available, each user in a shared workspace can benefit from their own previous runs (single user call-caching). Using collaborators previous runs for call caching in a shared workspace will be in the future roadmap based on user needs
Of course, ultimately, it’s up to you to know whether the workflow you're launching is appropriate to call cache or not.
How to enable/disable call caching
Calls are always automatically cached whenever a job is processed through Terra, so you don’t need to remember to check the box when running a new workflow configuration that you may replicate later. You should only uncheck the box when you don't want the workflow you’re launching to benefit from previous runs.
Why didn't call caching work for me?
If you expected call caching to work, and it didn't, first check to make sure you've satisfied the following requirements.
Call caching requires consistency in the task inputs
If you set your task's runtime attributes using input variables, changes to the values of these variables count as changes to the inputs, resulting in a different hash for the task. Read more about hashes in the next section!
Call caching may fail if your files are being fed in as String rather than File inputs
This is because the hashes of two identical files stored in different locations would be the same. The hashes of the String values for the different locations would be different, even though the contents of the file are the same.
Call caching requires consistency in the outputs of the task
Both the count (number of outputs) and the output expressions must be the same for call caching to work. If you publish a new version of your WDL that has one extra or one fewer output, it will not be able to benefit from a previously successful run of the same task, even if the inputs are the same.
Troubleshooting with the call caching debug wizard
If you have a workflow you believe should have benefited from call caching, you can run the Call Caching Debug Wizard found in the Workflow Dashboard for the workflow of interest.
1. Click the Submission History tab at the left side of the Workflows page to find your submission.
2. Click on the submission name to go to the submission detail page.
3. In the Sample ID column of the Submission Details page, click on the input for the run you think should have benefited from call caching.
4. In the Workflows Details page, click on the magnifying glass icon under Call Caching Result (right-hand side).
Note that you can also find the task and backend standard output and standard error logs by clicking the Logs link under Task Data.
What can I change without breaking call caching?You can change hard-coded values in your runtime block without breaking call caching. This is particularly useful if your job failed due to inadequate memory or disk space. You can make this change by publishing a new version of your WDL with the updated runtime variable values.
The exception to this rule are the following runtime variables: ContinueOnReturnCode
, Docker
, FailOnStderr
Changes to these runtime variables will break call caching!
Changing runtime variables such as memory
, disk
, or cpu
using task inputs will break call caching since this is registered as a change to the inputs for the task.
How does it work? (technical details, if you're curious)
-
Cromwell searches the cache of previously run jobs for one that has the exact same command and exact same inputs. If a previously run job is found in the cache, Cromwell will use the results of the previous job instead of re-running it. See the Cromwell call-caching documentation.
How does Cromwell know if that exact run has been executed before (same workflow, same parameter, same inputs)?
To understand how call caching works under the hood, we have to start with a concept known in computer science as a “hash function.” A hash function maps large inputs to a short code, or hash - a smaller, number with a fixed size. A good hash function does this in a way that two inputs are extremely unlikely to produce the same hash. Therefore, the hash can be used like a serial number to uniquely identify the input. Docker images, for example, come with hashes of themselves because it can be generally useful to know if docker images have identical contents.
Cromwell generates hashes for the input arguments of the workflows, as well as for the WDL script itself. Information such as filenames and input parameters are stored in this way for later use to identify when an identical workflow runs with the same input configuration. If the hashes are optimized so they always uniquely identify input configurations, then the call caching feature helps you benefit from the assumption that a given input will always lead to the same output.
What's an optimal hash function?
To understand what defines an optimal hash function, consider the diagram below. In this less-than-ideal example, the hash function stores names on a numbered list, but for some reason some names map to the same output hash. This situations is called a “collision,” and an optimized hash function has as few of them as possible. Since the chance of a hash collision can be predicted based on the size of the hash and the number of inputs hashed, it’s possible to pick a hash size large enough that collisions are astronomically unlikely. That's how Cromwell ensures that the hashes it generates are able to organize inputs and outputs to make call caching possible.
Example of a bad hash function: