Running a complicated workflow can lead to a high number of large intermediate outputs that increase the storage costs of a project. As an example, a large-scale project recently discovered that as much as 85% of their storage cost was for intermediate files that were never accessed or used. To (potentially) capture significant savings when working in Terra, you can choose to delete intermediate files upon successful completion of the workflow.
Delete Intermediate Files options explained
Intermediate files are kept unless you select the Delete intermediate outputs option in the workflow configuration (see screenshot below).
This is because intermediate files are required to use call caching, which is selected by default.
When to delete intermediate files
- When you're running a well-tested workflow
- When you won't be running the same workflow on the same data for further analysis
When to use call caching
- Before running downstream analysis on the same data
- When you're working in a different workspace and want to reproduce earlier results
- When you are testing or troubleshooting a partially failed workflow, or are otherwise not sure if a workflow will complete. Call caching lets you start the workflow again at the beginning of the task that failed, rather than rerunning the entire workflow from the beginning.
Intermediate files during unsuccessful workflows
If a workflow fails to complete, the intermediate files are not deleted. This lets you use call caching to start the workflow again right before a failed step.
Call-caching and "delete intermediate files" option
A workflow run with delete intermediates option enabled can always READ from the call cache, but it will not WRITE its own results to the call cache.
Say, for example, you previously ran workflow X with delete intermediates and now want to run it again with the same inputs and call caching turned on. The workflow won't use the existing call cached workflow, because the intermediate files don’t exist anymore (and cannot be call cached). When Cromwell deletes the intermediate files, it also invalidates those call-cache entries.
Manually deleting intermediate files
Many researchers don’t like to delete intermediate files until after they complete all their research. You can manually delete intermediate files at your discretion with this notebook script in this workspace. Make sure you read the instructions carefully as we cannot recover any data that you accidentally delete using this tool.