Neverending jobs

Post author
dplichta

Hi,

We are observing some super long runs for a few tasks that normally take minutes / hour to complete. Could you investigate what's going on?

Workspace rjxmicrobiome/rjxmicrobiome, workflow id e4c73b68-684d-45d7-b396-996112d8766c.

Damian

Comments

2 comments

  • Comment author
    Adelaide Rhodes

    Hi Damian -

     

    I have contacted the Cromwell team to get an answer for you.  I am also creating Zendesk ticket for this issue.

     

    Adelaide

    0
  • Comment author
    Adelaide Rhodes

    Damian, 

     

    Khalid suggested try to:

     

    `gsutil cat gs://fc-secure-802bf880-16b1-4a10-ad89-da98f79919b8/01f0aeea-3d94-4a2c-ad22-057c82724c0e/workflowBiobakery/e4c73b68-684d-45d7-b396-996112d8766c/call-combineMetaphlan/attempt-3/combineMetaphlan.log` 

     

    You may see that the job is _very slowly_ downloading the 2,405 inputs.

     

    If that's the case, this is the issue https://broadworkbench.atlassian.net/browse/BA-5666 The downloads are moving very slowly, but should eventually finish.

     

    NOTE: While the above issue is being actively worked on, the ticket also has a comment with a workaround. The user can create a Jira account to see said comment directly, including the part that mentions if they upgrade their WDL to version 1.0 they can add the workaround _and_ maintain (future) call caching.

     

    All that said, if there is empirical evidence within the logs that the downloads are stuck (not a lack of logs, but instead hundreds of downloads then nothing for a day) then we should kill the job.

     

    Let me know if we should kill it.

     

    Thanks,

     

    Adelaide

    0

Please sign in to leave a comment.