Running a complicated workflow can lead to a high number of large intermediate outputs, which can increase the storage costs of a project. For example, a large-scale project recently discovered that as much as 85% of their storage cost was for storage of intermediate files that were never accessed or used. Terra offers the option to delete intermediate files upon successful completion of the workflow, enabling significant savings.
Delete Intermediate Files options explained
Intermediate files are kept unless the "Delete intermediate outputs" option in the workflow configuration (see screenshot below) is selected.
Intermediate files during unsuccessful workflows
If a workflow fails to complete, the intermediate files are not deleted. This allows you to use call caching to start the workflow again right before a failed step.
Call-caching and "delete intermediate files" option
A workflow run with delete intermediates option enabled can always READ from the call cache, but it will not WRITE its own results to the call cache.
Say, for example, you previously ran workflow X with delete intermediates and now want to run it again with the same inputs and call caching turned on. The workflow won't use the existing call cached workflow, because the intermediate files don’t exist anymore (and cannot be call cached). When Cromwell deletes the intermediate files, it also invalidates those call-cache entries.
Manually deleting intermediate files
Many researchers don’t like to delete intermediate files until after they complete all their research. You can manually delete intermediate files at your discretion with this notebook script we've made available in this workspace.Make sure you read the instructions carefully as we cannot recover any data that you accidentally delete using this tool.