Zombie machines from failed tasks running up costs
(I also submitted a note through Contact Us but figured this may be a better forum since other users should probably be aware)
One of our submissions failed 6 days ago because of file permissions preventing localization of input files. We only just noticed that 17 of the workflows are still running and have accrued ~$1000 of costs during this time. Here is one of the 17: https://portal.firecloud.org/#workspaces/aryee-lab/dna-methylation_copy/monitor/d5502058-7bdf-49fc-baf1-84fc69f88ca8/d468266b-0dcc-4d16-8408-b76554568298
I have tried to manually abort them but they are now stuck in an "Aborting" step (and still running up costs according to the Google Cloud console).
Is there a way to a) recover these funds and b) prevent/guard against this behavior?
Thanks,
Martin
Comments
1 comment
Hello Martin -
This is definitely a major concern - thank you for reporting. I will make sure that this is looked at first thing.
Please sign in to leave a comment.