ContinueWhilePossible not default?

Post author
Laura Gauthier

What is "Will not start job CollectCoverageGCNV.CollectCounts:1:1 when workflow state is 'WorkflowExecutionFailingState' and when 'restarting'=false" ?  I've never seen this one before.  It sounds like one of the samples failed and the rest won't launch, but it's impossible to tell which one was a real failure amid thousands of "won't starts".

Comments

2 comments

  • Comment author
    Jason Cerrato

    Hi Laura,

    Thanks for flagging this up. We'll take a look at what's going on here and get back to you as soon as we can!

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hey Laura,

    Our engineers have identified this to be the root cause of the submission failure:

    cromwell.engine.workflow.lifecycle.execution.job.preparation.JobPreparationActor$$anonfun$1$$anon$1: Call input and runtime attributes evaluation failed for CollectCounts: : [Attempted 1 time(s)] - FileNotFoundException: gs://fc-eb97544d-4f02-4356-8ed0-45493163bbf6/CMG_Broad_VCGS_OrphanDisease_WES_Closed_Nov2019/RP-1307/Exome/VCGS_FAM149_464_D1/v2/VCGS_FAM149_464_D1.cram File not found gs://fc-eb97544d-4f02-4356-8ed0-45493163bbf6/CMG_Broad_VCGS_OrphanDisease_WES_Closed_Nov2019/RP-1307/Exome/VCGS_FAM149_464_D1/v2/VCGS_FAM149_464_D1.cram at cromwell.engine.workflow.lifecycle.execution.job.preparation.JobPreparationActor$$anonfun$1.applyOrElse(JobPreparationActor.scala:81) at cromwell.engine.workflow.lifecycle.execution.job.preparation.JobPreparationActor$$anonfun$1.applyOrElse(JobPreparationActor.scala:74) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at akka.actor.FSM.processEvent(FSM.scala:707) at akka.actor.FSM.processEvent$(FSM.scala:704) at cromwell.engine.workflow.lifecycle.execution.job.preparation.JobPreparationActor.processEvent(JobPreparationActor.scala:46) at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:701) at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:695) at akka.actor.Actor.aroundReceive(Actor.scala:539) at akka.actor.Actor.aroundReceive$(Actor.scala:537) at cromwell.engine.workflow.lifecycle.execution.job.preparation.JobPreparationActor.aroundReceive(JobPreparationActor.scala:46) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:614) at akka.actor.ActorCell.invoke(ActorCell.scala:583) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268) at akka.dispatch.Mailbox.run(Mailbox.scala:229) at akka.dispatch.Mailbox.exec(Mailbox.scala:241) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

    The main relevant portion seems to be this

    FileNotFoundException: gs://fc-eb97544d-4f02-4356-8ed0-45493163bbf6/CMG_Broad_VCGS_OrphanDisease_WES_Closed_Nov2019/RP-1307/Exome/VCGS_FAM149_464_D1/v2/VCGS_FAM149_464_D1.cram File not found gs://fc-eb97544d-4f02-4356-8ed0-45493163bbf6/CMG_Broad_VCGS_OrphanDisease_WES_Closed_Nov2019/RP-1307/Exome/VCGS_FAM149_464_D1/v2/VCGS_FAM149_464_D1.cram

    The meaning of the error messages you see in Job Manager is that one of the shards of the scatter failed while Cromwell was going to submit other shards. But because one the shards failed and there no reason anymore to submit the remaining ones, Cromwell showed this error instead of submitting them.

    We recommend checking on that missing file and re-running if things look good.

    Kind regards,

    Jason

    0

Please sign in to leave a comment.