Preemptible task not rerunning after being preempted

Comments

8 comments

  • Marianie Simeon

    Hi Justin,

    If you are able to, can you share the workspace with GROUP_FireCloud-Support@firecloud.org as a Writer (you can remove this permission once we have resolved the issue)? You may need to share any Workflows that are not already publicly readable independent of the Workspace - this can be done through the FireCloud Methods Repository which you can access with this link https://portal.firecloud.org/?return=terra#methods. Sharing the Workspace will allow us to look directly into the logs and troubleshoot more efficiently.

     

    Best,

    Marianie

     

     

  • Justin Rhoades

    Hi Marianie,

    The workspace should be shared already with that account.  The workflow should be public as well.

    workspace-id: c1d3840a-733b-4a89-8a78-c794e3acd032 submission-id: 547cd2da-7d2e-4462-85d8-86cf2c76d521

     

    Best,

    Justin

  • Marianie Simeon

    Hi Justin, 

     

    Thank you for sharing. We are looking into it now. In the meantime, can you also share the workspace name?

     

    Best,

    Marianie

  • Justin Rhoades

    Hi Marianie,

     

    The workspace is blood-biopsy/early_stage_BC_whole_genome_analysis.

     

    Best,

    Justin

  • Marianie Simeon

    Hi Justin,

    We saw this error "A USER ERROR has occurred: Argument -L, --interval-set-rule has a bad value: [gs://fc-c1d3840a-733b-4a89-8a78-c794e3acd032/547cd2da-7d2e-4462-85d8-86cf2c76d521/Mutect2/8f858803-87af-416a-9b3a-1884f7ec5f69/call-SplitIntervals/glob-0fc990c5ca95eebc97c4c204e3e303e1/0099-scattered.interval_list, gs://gatk-best-practices/somatic-b37/small_exac_common_3.vcf],INTERSECTION. The specified intervals had an empty intersection" in the JES's M2-99.log for call #136.

    Can you inspect the 0099-scattered.interval_list to see if there is an intersecting issue with the small_exac_common_3.vcf.

    Best,
    Marianie

  • Justin Rhoades

    Hi Marianie,

    I saw that failure and think I understand what happened and how to prevent it from happening again on a subsequent run.  My real question is what happened to shard 46.  It looks like it failed to relaunch after it was preempted.  Can you check that one for me?

    Best,

    Justin

  • Sushma Chaluvadi

    Hi Justin,

    You are right that shard 46 failed due to preemption and indeed it should have been retried with a non-preemptible machine. We think that it failed because shard 99 caused a return code that was non-zero which resulted in the entire workflow getting an indication to stop running before shard 46 had a chance to try again with a non-preemptible. Have you had the opportunity to fix the error causing shard 99 to fail and re-running the workflow?

     

    Sushma

  • Justin Rhoades

    Hi Sushma,

    I think I've fixed the problem with shard 99 and am re-running now to see if that solves the problem.

    Justin

Please sign in to leave a comment.

Powered by Zendesk