Call caching: How it works and when to use it

Call caching allows Terra's execution engine (aka Cromwell) to detect when a job has been run in the past so that it doesn't have to re-compute results. Learn how call caching can save you time and money when you are repeating all or parts of a workflow analysis in Terra.

Call caching overview

Call caching allows you to "rerun" or restart a workflow without redoing the calculations. This can be a huge advantage anytime you don’t want to pay to run part or all of the same workflow on the same data twice.

Examples of when call caching is useful

Before running downstream analysis on the same data
When you're working in a different workspace and want to reproduce earlier results
When you are testing or troubleshooting a partially failed workflow, or are otherwise not sure if a workflow will complete. Call caching lets you start the workflow again at the beginning of the task that failed, rather than rerunning the entire workflow from the beginning.

Of course, ultimately it’s up to you to know whether the workflow you're launching is appropriate to call cache or not.

Examples of when you may not benefit from call caching

If your workflow relies on interacting with the world outside of Cromwell - if it pulls live data from an external website, for example.
If you're benchmarking a workflow to determine time and cost, using cached results will provide inaccurate values.
If you are using the "Delete intermediate files" option.

You can't use both "Use call caching" and "Delete intermediate files"Call caching relies on "intermediate files" generated during processing. Thus, the "delete intermediate files" option can override the "Use call caching" option.

A workflow run with the delete intermediates option enabled can always READ from the call cache, but it will not WRITE its own results to the call cache.

Say, for example, you previously ran workflow X with delete intermediates and now want to run it again with the same inputs and call caching turned on. The workflow will not use the existing call-cached workflow, because the intermediate files don’t exist anymore (and cannot be call cached). When Cromwell deletes the intermediate files, it also invalidates those call cache entries.

How to enable/disable call caching

In Terra, call caching is enabled by default. You can check the status, as well as turn off call caching, from the "Use call caching" checkbox in the workflow configuration form.

Screenshot of a workflows configuration page with an arrow pointing to the Use Call Caching option, which is clicked by default. The option is directly below choosing the root entity type and choosing the data steps 1 and 2 in the configuration form

Call caching is enabled by default in Terra Calls are always automatically cached whenever a job is processed through Terra, so you don’t need to remember to check the box when running a new workflow configuration that you may replicate later. You should only uncheck the box when you are deleting intermediate outputs or don't want the workflow you’re launching to benefit from previous runs.

Why didn't call caching work for me?

If you expected call caching to work, and it didn't, first check to make sure you've satisfied the following requirements.

Call caching requires consistency in the inputs of the task

If you set your task's runtime attributes using input variables, changes to the values of these variables count as changes to the inputs, resulting in a different hash for the task. Read more about hashes in the next section!

Call caching may fail if your files are being fed in as `String` rather than `File` inputs

This is because the hashes of two identical files stored in different locations would be the same. The hashes of the String values for the different locations would be different, even though the contents of the file are the same.

Call caching also requires consistency in the outputs of the task

Both the count (number of outputs) and the output expressions must be the same for call caching to work. If you publish a new version of your WDL that has one extra or one fewer output, it will not be able to benefit from a previously successful run of the same task, even if the inputs are the same.

Additional troubleshooting resources

If you have a workflow you believe should have benefited from call caching you can run the Call Caching Debug Wizard found in the Workflow Dashboard for the workflow of interest.

How to find the Call caching debug wizard

1. First, navigate to the Job History page for the workflow you believe should have benefited from call caching and click on the Workflow Dashboard icon.

Screenshot of the workflow dashboard icon in the middle of the three icons under Links

2. Next, click on the magnifying glass for the task or shard where you see a Cache Miss message.

Screenshot of the call lists section at the bottom of the workflow dashboard. To the far right of the first attempt with an unexpected status is a microscope icon beside the words cach Miss labeled Call cache debug wizard.

3. Follow the wizard, filling in the details as requested, and you will be presented with the hashes for your two runs. If they are not identical, then there was some difference between the runs that resulted in a cache miss.

What can I change without disrupting call caching?You can change hard-coded values in your runtime block without interfering with call caching functionality. This is particularly useful if your job failed due to inadequate memory or disk space. You can make this change by publishing a new version of your WDL with the updated runtime variable values.

The exception to this rule are the following runtime variables: ContinueOnReturnCode, Docker, FailOnStderr. Changes to these runtime variables will invalidate call caching!

Changing runtime variables such as memory, disk, or cpu using task inputs will interfere with call caching functionality since this is registered as a change to the inputs for the task.

How does it work? (technical details, if you're curious)

Cromwell searches the cache of previously run jobs for one that has the exact same command and exact same inputs. If a previously run job is found in the cache, Cromwell will use the results of the previous job instead of re-running it.

How does Cromwell know if that exact run has been executed before (same workflow, same parameter, same inputs)?

To understand how call caching works under the hood, we have to start with a concept known in computer science as a “hash function.” A hash function maps large inputs to a short code, or hash - a smaller number with a fixed size. A good hash function does this in a way that two inputs are extremely unlikely to produce the same hash. Therefore, the hash can be used like a serial number to uniquely identify the input. Docker images, for example, come with hashes of themselves because it can be generally useful to know if docker images have identical contents.

Cromwell generates hashes for the input arguments of the workflows, as well as for the WDL script itself. Information such as filenames and input parameters are stored in this way for later use to identify when an identical workflow runs with the same input configuration. If the hashes are optimized so they always uniquely identify input configurations, then the call caching feature helps you benefit from the assumption that a given input will always lead to the same output.

What's an optimal hash function?

To understand what defines an optimal hash function, consider the diagram below. In this less-than-ideal example, the hash function stores names on a numbered list, but for some reason some names map to the same output hash. This situations is called a “collision,” and an optimized hash function has as few of them as possible. Since the chance of a hash collision can be predicted based on the size of the hash and the number of inputs hashed, it’s possible to pick a hash size large enough that collisions are astronomically unlikely. That's how Cromwell ensures that the hashes it generates are able to organize inputs and outputs to make call caching possible.

Example of a bad hash function:

Comments

3 comments

Brendan Reardon
- March 02, 2021 17:24
Anton Kovalsky can you confirm more details of how call-caching works? My understanding is that it is performed based on either the MD5 or crc32c hash for each file / input. Thus, it won't matter if I change a file name or move the object to a different bucket or folder. Is this correct?

0
Anton Kovalsky
- March 04, 2021 16:13
Yes, that should be correct, are you experiencing any issues with this? Note that if you're using String inputs to pass in file URIs, Cromwell can’t necessarily tell that it’s actually supposed to be a File.

0
Brendan Reardon
- March 04, 2021 18:09
No issues here, just seeking clarification. Thank you!

0

Please sign in to leave a comment.

Call caching: How it works and when to use it

Call caching overview

Examples of when call caching is useful

Examples of when you may not benefit from call caching

How to enable/disable call caching

Why didn't call caching work for me?

Call caching requires consistency in the inputs of the task

Call caching may fail if your files are being fed in as `String` rather than `File` inputs

Call caching also requires consistency in the outputs of the task

Additional troubleshooting resources

How to find the Call caching debug wizard

How does it work? (technical details, if you're curious)

What's an optimal hash function?

Was this article helpful?

That’s great, can you tell us why? (Click all that apply)

Thanks for your feedback, help us improve by telling us what you think could be better (click all that apply)

Comments

Call caching overview

Examples of when call caching is useful

Examples of when you may not benefit from call caching

How to enable/disable call caching

Why didn't call caching work for me?

Call caching requires consistency in the inputs of the task

Call caching may fail if your files are being fed in as String rather than File inputs

Call caching also requires consistency in the outputs of the task

Additional troubleshooting resources

How to find the Call caching debug wizard

How does it work? (technical details, if you're curious)

What's an optimal hash function?

Was this article helpful?

That’s great, can you tell us why? (Click all that apply)

Thanks for your feedback, help us improve by telling us what you think could be better (click all that apply)

Call caching may fail if your files are being fed in as `String` rather than `File` inputs