Call caching is a Cromwell feature that allows the engine to detect whether a job has been run before, and if possible to reuse the results in order to save hassle, time, and money. In Terra, this feature is enabled by checking the call caching box when launching a workflow.
What is call caching, and how does it work?
To understand how call caching works under the hood, we have to start with a concept known in computer science as a “hash function”. A hash function takes a potentially large input and maps it to a smaller fixed sized number, known as a hash. A good hash function does this in a way that two inputs are extremely unlikely to produce the same hash. Therefore the hash can be used like a serial number to uniquely identify the input. Docker images, for example, come with hashes of themselves because it can be generally useful to know if docker images have identical contents.
Cromwell also generates hashes for the input arguments of the workflows, as well as for the WDL script itself. Information such as filenames and input parameters are stored in this way for later use to identify when an identical workflow runs with the same input configuration. If those hashes are optimized so that they always uniquely identify input configurations, then the call caching feature helps you benefit from the assumption that a given input will always lead to the same output.
To understand what makes for an optimal hash function, consider the diagram above, which shows a less-than-ideal example: A hash function stores names on a numbered list, but for some reason some names map to the same spot in the output. These situations are called “collisions”, and an optimized hash function has as few of them as possible. Since the chance of a hash collision can be predicted based on the size of the hash and the number of inputs hashed, it’s possible to pick a hash size so that collisions are astronomically unlikely. That's what Cromwell does in order to ensure that the hashes it generates are useful for call caching, which then takes advantage of the hash functions’ ability to organize inputs and outputs.
Why use call caching?
If the hash can point you to pre-computed outputs that deterministically follow from the inputs you selected, Cromwell figures why not skip the whole pesky business of computing them anew, and instead just provides the old results for you. In addition to making your workflows more efficient, this can also save you a lot of headache when you have workflows that are at risk for certain types of failure or preemption. Call caching works because we assume that if the inputs to a job are the same then the outputs are the same.
Of course, it’s up to you to know whether the workflow you're launching is appropriate to call cache or not. For instance, if your workflow relies on interacting with the world outside of Cromwell, like pulling live data from a website then you may want to un-check the call caching box. You may not want to use call caching if you are benchmarking a workflow to determine time and cost, as using cached results will provide inaccurate results.
Using call caching can be a huge advantage anytime you don’t want to pay to run the same thing twice. It can also be useful when encountering failures, as Cromwell takes a stringent approach to rerunning partially failed workflows, opting to re-launch all parts of the failed workflow, rather than just the parts that failed. With call caching enabled, Cromwell can at least skip over the steps it knows are already done and just use the pre-computed outputs.
By the way, calls are always automatically cached whenever a job is processed through Terra, so you don’t even need to remember to check the box when running a new workflow configuration that you may replicate later. You should only check the box when you want the workflow you’re launching to benefit from previous runs.