Zombie machines from failed tasks running up costs

Post author
Martin Aryee

(I also submitted a note through Contact Us but figured this may be a better forum since other users should probably be aware)

One of our submissions failed 6 days ago because of file permissions preventing localization of input files. We only just noticed that 17 of the workflows are still running and have accrued ~$1000 of costs during this time. Here is one of the 17: https://portal.firecloud.org/#workspaces/aryee-lab/dna-methylation_copy/monitor/d5502058-7bdf-49fc-baf1-84fc69f88ca8/d468266b-0dcc-4d16-8408-b76554568298

I have tried to manually abort them but they are now stuck in an "Aborting" step (and still running up costs according to the Google Cloud console).

Is there a way to a) recover these funds and b) prevent/guard against this behavior?

Thanks,

Martin

Comments

1 comment

  • Comment author
    Sushma Chaluvadi

    Hello Martin -

     

    This is definitely a major concern - thank you for reporting. I will make sure that this is looked at first thing.

    0

Please sign in to leave a comment.