I am developing this WDL on Terra for a BioData Catalyst fellow. It works fine when run locally, but when run on Terra the final output is destroyed. Call cacheing disabled, delete intermediates disabled. The workflow reports a success, even giving an address to an output that simply does not exist. I downloaded the relevant portion of the bucket via the command line to check it wasn't just the console's GUI, file's still not there. This is reproducible by running the workflow as I did with these inputs.
The first task takes in Array[File] of gds and outputs RData representing variants to be pruned. The second task takes in zip(gds, first task's RData) and outputs subsetted gds files. It's that final output that's missing. Oddly, the configuration text file which is also defined as an output for debugging purposes is not deleted, nor is the RData file from the previous step, it's just the subset output.
At first I thought this may be due to how the R script in subset performs cleanup, but this works fine when run on local Cromwell via the Dockstore CLI. (I get that local Cromwell is not as well supported but it is orders of magnitude faster for me to develop locally and just do final tests on Terra.) So that seems to indicate it is a Terra issue, or maybe a GCS issue? That said, this has never happened to me before so I wouldn't be surprised if it turns out to be a quirk of the particular WDL I am writing.
To be clear, I'm not asking for data recovery for my deleted outputs, as each run costs me about eight cents and it's all open data, I just need to figure out how to avoid this issue so I can publish this workflow with confidence.
Location of the ghost outputs
R script cleanup, which might be a red herring, might not?
Please sign in to leave a comment.