Error message: PAPI error code 10

Jason Cerrato

The Error

PAPI error code 10. 
The assigned worker has failed to complete the operation.

What it means 

In general, if you ever see your workflow fail with PAPI error code 10. The assigned worker has failed to complete the operation, then one of two things happened:

  1. Your machine actually failed.
  2. The machine was preempted in such a way that it looks like machine failure.

This means that error code 10 is sort of a catch-all error message, and this error doesn't get re-tried, so more of you will tend to notice workflows failing with that error message.

Workarounds

  1. You can choose to turn preemptibles off, as this will take away the possibility of silent preemption failures.
  2. Use the attribute maxRetries to retry tasks that failed after your command fails with a non-zero return code, so that transient type failures get more attempts without manual intervention.
  3. PAPI error code 10 can also occur for reasons related to insufficient memory or disk space. If the problem does not likely appear to be related to preemptibles or retries, we recommend increasing the allocated memory to see if you get farther in your task or if it resolves the issue. If increasing the memory does not resolve the issue, you can try increasing the disk size.
    • Insufficient disk space is a common cause of the error when not allocating enough for the localization of the bams you are trying to process in the workflow.

What to look for

Sometimes your machine has truly crashed and has nothing to do with preemption, and in such cases it's worth looking into your logs to see if the job always crashes a certain number of mins into the process, or always when copying inputs/outputs or when pulling docker images. Those patterns could help indicate why your tasks are failing with error code 10.

 

Was this article helpful?

Comments

6 comments

  • Comment author
    Yiming Yang

    Hi Jason,

    Thank you for your clarification!

    I did come across this error. But I finally figured out that it was due to insufficient disk space. So I would like to report here for your reference.

     

    Sincerely,
    Yiming

    0
  • Comment author
    Yiming Yang

    Due to Cromwell's documentation at https://cromwell.readthedocs.io/en/stable/RuntimeAttributes/, it seems that the default "preemptible" and "maxRetries" are both 0.

    So in order to go around this PAPI error code 10, should I just leave both of them as default, or have to manually set them to be 0 in my workflow's inputs?

    0
  • Comment author
    Jason Cerrato

    Hi Yiming,

    To follow the workarounds, you can just leave the preemptible defaulted to 0, but you would need to specify maxRetries to something above 0 if you actually did want the task to retry.

    I will be editing the article in the near future to include a section that explains that insufficient memory is also a possible cause of PAPI error code 10. If you are facing this error now, you can try the preemptible/maxRetries route and/or try increasing the overall memory for the task that's failing.

    If you would like us to take a closer look at your issue, please feel free to email us at support@terra.bio and we'll be happy to take a look.

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hi Yiming,

    Thank you for that information—I will add that to the article as well once I add the note about memory!

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hi Roger Dettloff,

    Thank you for letting me know. If you cannot change the memory or disk values in the method configuration, it may be the case that they are hardcoded into the WDL. If so, a member of our team can examine the workflow to see if the values need adjusting. For Terra featured workspace workflows (which this one is), you can post your concern to this forum board and we'll be happy to investigate: https://support.terra.bio/hc/en-us/community/topics/360001603491-Featured-Workspaces

    Be sure to read the What to do before you post! information before posting.

    Kind regards,

    Jason

    0
  • Comment author
    Roger Dettloff

    While running the https://github.com/gatk-workflows/gatk4-exome-analysis-pipeline workflow.

    I keep getting this error message: "Job Failed with Error Code 10 for a machine where Preemptible is set to false".  The error happens at the same place in the UnmappedBamToAlignedBam.SortSampleBam script each time I re-run it, so I suspect that it is a low disk space or low memory issue.  I do not know how to increase the memory allocation or disk size.   I'll search for documentation on how to do that, but it might be a worthwhile addition to this article to provide some detail or a link to more information on how to set the memory allocation.  Thanks for your help.

    0

Please sign in to leave a comment.