Call caching: How it works and when to use it

Anton Kovalsky

Call caching allows Terra's execution engine (aka Cromwell) to detect when a job has been run in the past so that it doesn't have to re-compute results. Learn how the call caching feature in Terra can save you time and money when you are repeating all or parts of a workflow analysis. 

Why use call caching

Call caching allows you to "rerun" or restart a workflow without having to redo the calculations, which can be a huge advantage anytime you don’t want to pay to run part or all of the same workflow on the same data twice:

  • Before running downstream analysis on the same data
  • When you're working in a different workspace and want to reproduce earlier results
  • When you are testing or troubleshooting a partially failed workflow, or are otherwise not sure if a workflow will complete. Call caching lets you start the workflow again at the beginning of the task that failed, rather than rerunning the entire workflow from the beginning. 

When you might not want to use call caching

Of course, it’s up to you to know whether the workflow you're launching is appropriate to call cache or not. For instance, if your workflow relies on interacting with the world outside of Cromwell - like pulling live data from an external website - you may want to un-check the call caching box. You may not want to use call caching if you are benchmarking a workflow to determine time and cost, as using cached results will provide inaccurate values.

G0_warning-icon.png


Which option is right for you? "Use call caching" or "Delete intermediate files"

 

Call caching relies on "intermediate files" generated during processing. Thus, the "delete intermediate files" option can override the "Use call caching" option.

A workflow run with delete intermediates option enabled can always READ from the call cache, but it will not WRITE its own results to the call cache. 

Say, for example, you previously ran workflow X with delete intermediates and now want to run it again with the same inputs and call caching turned on. The workflow will not use the existing call cached workflow, because the intermediate files don’t exist anymore (and cannot be call cached). When Cromwell deletes the intermediate files, it also invalidates those call cache entries. 

How to enable/disable call caching

In Terra, call caching is enabled by default. You can check the status, as well as turn off call caching, from the "Use call caching" checkbox in the workflow configuration form: 

2020-08-13_1545.png

G0_tip-icon.png


Call caching is enabled by default in Terra 

  Calls are always automatically cached whenever a job is processed through Terra, so you don’t even need to remember to check the box when running a new workflow configuration that you may replicate later. You should only check the box when you want the workflow you’re launching to benefit from previous runs.

 

Why didn't call caching work for me?

Call caching requires consistency in the inputs of the task. If you set your task's runtime attributes using input variables, changes to the values of these variables count as changes to the inputs, resulting in a different hash for the task. Read more about hashes in the next section!

Call caching may have failed if your files are being fed in as String rather than File inputs. The hashes of two identical files stored in different locations would be the same. The hashes of the String values for the different locations would be different, even though the contents of the file are the same.

Call caching also requires consistency in the outputs of the task, both the count (number of outputs) and the output expressions. If you publish a new version of your WDL that has one extra or one fewer output, it will not be able to benefit from a previously successful run of the same task, even if the inputs are the same.

If you have a workflow you believe should have benefited from call caching you can run the Call Caching Debug Wizard found in the Workflow Dashboard for the workflow of interest.

First, navigate to the Job History page for the workflow you believe should have benefited from call caching and click on the Workflow Dashboard icon.

mceclip0.png

Next, click on the magnifying glass for the task or shard where you see a Cache Miss message.

mceclip1.png

Follow the wizard, filling in the details as requested, and you will be presented with the hashes for your two runs. If they are not identical, then there was some difference between the runs that resulted in a cache miss.

G0_tip-icon.png


What can I change without breaking call caching?

 

You can change hard-coded values in your runtime block without breaking call caching. This is particularly useful if your job failed due to inadequate memory or disk space. You can make this change by publishing a new version of your WDL with the updated runtime variable values.

The exception to this rule are the following runtime variables: ContinueOnReturnCode, Docker, FailOnStderr

Changes to these runtime variables will break call caching!

And as mentioned earlier, changing runtime variables such as memory, disk, or cpu using task inputs will break call caching since this is registered as a change to the inputs for the task.

 

How does it work? Expand for technical details, if you're curious

Cromwell searches the cache of previously run jobs for one that has the exact same command and exact same inputs. If a previously run job is found in the cache, Cromwell will use the results of the previous job instead of re-running it.

How does Cromwell know if that exact run has been executed before (same workflow, same parameter, same inputs)?

To understand how call caching works under the hood, we have to start with a concept known in computer science as a “hash function.” A hash function maps large inputs to a short code, or hash - a smaller, number with a fixed size.  A good hash function does this in a way that two inputs are extremely unlikely to produce the same hash. Therefore, the hash can be used like a serial number to uniquely identify the input. Docker images, for example, come with hashes of themselves because it can be generally useful to know if docker images have identical contents.

Cromwell generates hashes for the input arguments of the workflows, as well as for the WDL script itself. Information such as filenames and input parameters are stored in this way for later use to identify when an identical workflow runs with the same input configuration. If the hashes are optimized so they always uniquely identify input configurations, then the call caching feature helps you benefit from the assumption that a given input will always lead to the same output.

What's an optimal hash function?

To understand what defines an optimal hash function, consider the  diagram below. In this less-than-ideal example, the hash function stores names on a numbered list, but for some reason some names map to the same output hash. This situations is called a “collision,” and an optimized hash function has as few of them as possible. Since the chance of a hash collision can be predicted based on the size of the hash and the number of inputs hashed, it’s possible to pick a hash size large enough that collisions are astronomically unlikely. That's how Cromwell ensures that the hashes it generates are able to organize inputs and outputs to make call caching possible.

Example of a bad hash function:
Hash_table_4_1_1_0_0_1_0_LL.svg.png

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

3 comments

  • Comment author
    Brendan Reardon

    Anton Kovalsky can you confirm more details of how call-caching works? My understanding is that it is performed based on either the MD5 or crc32c hash for each file / input. Thus, it won't matter if I change a file name or move the object to a different bucket or folder. Is this correct? 

    0
  • Comment author
    Anton Kovalsky

    Yes, that should be correct, are you experiencing any issues with this? Note that if you're using String inputs to pass in file URIs, Cromwell can’t necessarily tell that it’s actually supposed to be a File.

    0
  • Comment author
    Brendan Reardon

    No issues here, just seeking clarification. Thank you!

    0

Please sign in to leave a comment.