Call caching allows Terra's execution engine (aka Cromwell) to detect when a job has been run in the past so that it doesn't have to re-compute results. Learn how call caching can save you time and money when you are repeating all or parts of a workflow analysis in Terra.
Overview
Call caching allows you to "rerun" or restart a workflow without redoing the calculations. This can be a huge advantage anytime you don’t want to pay to run part or all of the same workflow on the same data twice.
Examples of when call caching is useful
- Before running downstream analysis on the same data
- When you're working in a different workspace and want to reproduce earlier results
- When you are testing or troubleshooting a partially failed workflow, or are otherwise not sure if a workflow will complete. Call caching lets you start the workflow again at the beginning of the task that failed, rather than rerunning the entire workflow from the beginning.
When you might not want to use call caching
Of course, ultimately it’s up to you to know whether the workflow you're launching is appropriate to call cache or not.
Examples of when you may not benefit from call caching
- If your workflow relies on interacting with the world outside of Cromwell - if it pulls live data from an external website, for example.
- If you are benchmarking a workflow to determine time and cost, using cached results will provide inaccurate values.
- If you are using the "Delete intermediate files" option.
You can't use both "Use call caching" and "Delete intermediate files"Call caching relies on "intermediate files" generated during processing. Thus, the "delete intermediate files" option can override the "Use call caching" option.
A workflow run with the delete intermediates option enabled can always READ from the call cache, but it will not WRITE its own results to the call cache.
Say, for example, you previously ran workflow X with delete intermediates and now want to run it again with the same inputs and call caching turned on. The workflow will not use the existing call-cached workflow, because the intermediate files don’t exist anymore (and cannot be call cached). When Cromwell deletes the intermediate files, it also invalidates those call cache entries.
How to enable/disable call caching
In Terra, call caching is enabled by default. You can check the status, as well as turn off call caching, from the "Use call caching" checkbox in the workflow configuration form.
Call caching is enabled by default in Terra Calls are always automatically cached whenever a job is processed through Terra, so you don’t need to remember to check the box when running a new workflow configuration that you may replicate later. You should only uncheck the box when you are deleting intermediate outputs or don't want the workflow you’re launching to benefit from previous runs.
Why didn't call caching work for me?
If you expected call caching to work, and it didn't, first check to make sure you've satisfied the following requirements.
Call caching requires consistency in the inputs of the task
If you set your task's runtime attributes using input variables, changes to the values of these variables count as changes to the inputs, resulting in a different hash for the task. Read more about hashes in the next section!
Call caching may fail if your files are being fed in as String
rather than File
inputs
This is because the hashes of two identical files stored in different locations would be the same. The hashes of the String values for the different locations would be different, even though the contents of the file are the same.
Call caching also requires consistency in the outputs of the task
Both the count (number of outputs) and the output expressions must be the same for call caching to work. If you publish a new version of your WDL that has one extra or one fewer output, it will not be able to benefit from a previously successful run of the same task, even if the inputs are the same.
Additional troubleshooting resources
If you have a workflow you believe should have benefited from call caching you can run the Call Caching Debug Wizard found in the Workflow Dashboard for the workflow of interest.
How to find the Call caching debug wizard
1. First, navigate to the Job History page for the workflow you believe should have benefited from call caching and click on the Workflow Dashboard icon.
2. Next, click on the magnifying glass for the task or shard where you see a Cache Miss message.
3. Follow the wizard, filling in the details as requested, and you will be presented with the hashes for your two runs. If they are not identical, then there was some difference between the runs that resulted in a cache miss.
What can I change without breaking call caching?You can change hard-coded values in your runtime block without breaking call caching. This is particularly useful if your job failed due to inadequate memory or disk space. You can make this change by publishing a new version of your WDL with the updated runtime variable values.
The exception to this rule are the following runtime variables: ContinueOnReturnCode
, Docker
, FailOnStderr
Changes to these runtime variables will break call caching!
Changing runtime variables such as memory
, disk
, or cpu
using task inputs will break call caching since this is registered as a change to the inputs for the task.
How does it work? (technical details, if you're curious)
-
Cromwell searches the cache of previously run jobs for one that has the exact same command and exact same inputs. If a previously run job is found in the cache, Cromwell will use the results of the previous job instead of re-running it. See the Cromwell call-caching documentation.
How does Cromwell know if that exact run has been executed before (same workflow, same parameter, same inputs)?
To understand how call caching works under the hood, we have to start with a concept known in computer science as a “hash function.” A hash function maps large inputs to a short code, or hash - a smaller, number with a fixed size. A good hash function does this in a way that two inputs are extremely unlikely to produce the same hash. Therefore, the hash can be used like a serial number to uniquely identify the input. Docker images, for example, come with hashes of themselves because it can be generally useful to know if docker images have identical contents.
Cromwell generates hashes for the input arguments of the workflows, as well as for the WDL script itself. Information such as filenames and input parameters are stored in this way for later use to identify when an identical workflow runs with the same input configuration. If the hashes are optimized so they always uniquely identify input configurations, then the call caching feature helps you benefit from the assumption that a given input will always lead to the same output.
What's an optimal hash function?
To understand what defines an optimal hash function, consider the diagram below. In this less-than-ideal example, the hash function stores names on a numbered list, but for some reason some names map to the same output hash. This situations is called a “collision,” and an optimized hash function has as few of them as possible. Since the chance of a hash collision can be predicted based on the size of the hash and the number of inputs hashed, it’s possible to pick a hash size large enough that collisions are astronomically unlikely. That's how Cromwell ensures that the hashes it generates are able to organize inputs and outputs to make call caching possible.
Example of a bad hash function: