Call caching allows Terra's execution engine (aka Cromwell) to detect when a job has been run in the past so that it doesn't have to re-compute results. Learn how the call caching feature in Terra can save you time and money when you are repeating all or parts of a workflow analysis.
Why use call caching
Call caching allows you to "rerun" or restart a workflow without having to redo the calculations, which can be a huge advantage anytime you don’t want to pay to run part or all of the same workflow on the same data twice:
- Before running downstream analysis on the same data
- When you're working in a different workspace and want to reproduce earlier results
- When you are testing or troubleshooting a partially failed workflow, or are otherwise not sure if a workflow will complete. Call caching lets you start the workflow again at the beginning of the task that failed, rather than rerunning the entire workflow from the beginning.
When you might not want to use call caching
Of course, it’s up to you to know whether the workflow you're launching is appropriate to call cache or not. For instance, if your workflow relies on interacting with the world outside of Cromwell - like pulling live data from an external website - you may want to un-check the call caching box. You may not want to use call caching if you are benchmarking a workflow to determine time and cost, as using cached results will provide inaccurate values.
Call caching relies on "intermediate files" generated during processing. Thus, the "delete intermediate files" option can override the "Use call caching" option.
A workflow run with delete intermediates option enabled can always READ from the call cache, but it will not WRITE its own results to the call cache.
Say, for example, you previously ran workflow X with delete intermediates and now want to run it again with the same inputs and call caching turned on. The workflow will not use the existing call cached workflow, because the intermediate files don’t exist anymore (and cannot be call cached). When Cromwell deletes the intermediate files, it also invalidates those call cache entries.
How to enable/disable call caching
In Terra, call caching is enabled by default. You can check the status, as well as turn off call caching, from the "Use call caching" checkbox in the workflow configuration form:
|Calls are always automatically cached whenever a job is processed through Terra, so you don’t even need to remember to check the box when running a new workflow configuration that you may replicate later. You should only check the box when you want the workflow you’re launching to benefit from previous runs.
How does it work? Expand for technical details, if you're curious
How does Cromwell know if that exact run has been executed before (same workflow, same parameter, same inputs)?
To understand how call caching works under the hood, we have to start with a concept known in computer science as a “hash function.” A hash function maps large inputs to a short code, or hash - a smaller, number with a fixed size. A good hash function does this in a way that two inputs are extremely unlikely to produce the same hash. Therefore, the hash can be used like a serial number to uniquely identify the input. Docker images, for example, come with hashes of themselves because it can be generally useful to know if docker images have identical contents.
Cromwell generates hashes for the input arguments of the workflows, as well as for the WDL script itself. Information such as filenames and input parameters are stored in this way for later use to identify when an identical workflow runs with the same input configuration. If the hashes are optimized so they always uniquely identify input configurations, then the call caching feature helps you benefit from the assumption that a given input will always lead to the same output.
What's an optimal hash function?
To understand what defines an optimal hash function, consider the diagram below. In this less-than-ideal example, the hash function stores names on a numbered list, but for some reason some names map to the same output hash. This situations is called a “collision,” and an optimized hash function has as few of them as possible. Since the chance of a hash collision can be predicted based on the size of the hash and the number of inputs hashed, it’s possible to pick a hash size large enough that collisions are astronomically unlikely. That's how Cromwell ensures that the hashes it generates are able to organize inputs and outputs to make call caching possible.
Example of a bad hash function: