When running a workflow, intermediate steps often have outputs that are not very useful compared to the overall results. Complicated workflows can have a large number of large intermediates, which can increase the storage costs of a project. For example, a large scale project recently discovered that as much as 85% of their storage cost was going to store intermediate files that no one ever accessed or used. Terra now offers the option to delete intermediate files upon successful completion of the workflow, enabling significant savings.
The Delete Intermediate Files option explained
Intermediate files are kept unless the "Delete intermediate outputs" option in the workflow configuration (see screenshot below) is selected.
Intermediate files during unsuccessful workflows
If a workflow fails to complete, the intermediate files will not be deleted. This allows you to use call caching to start the workflow again right before a failed step.
Call-caching and "delete intermediate files" option
A workflow run with delete intermediates option enabled can always READ from the call cache, but it will not WRITE its own results to the call cache.
Say, for example, you previously ran workflow X with delete intermediates and now want to run it again with the same inputs and call caching turned on. The workflow will not use the existing call cached workflow, because the intermediate files don’t exist anymore (and cannot be call cached). When Cromwell deletes the intermediate files, it also invalidates those call cache entries.
Manually deleting intermediate files
Many researchers don’t like to delete intermediate files until after they have completed all their research. You can manually delete intermediate files at your discretion with this notebook script we've made available in this workspace. Make sure you read the instructions carefully as we cannot recover any data that you accidentally delete using this tool.