It's back: PAPI error code 10.
Hi,
Just noticed one of my jobs failing with PAPI error code 10. Wasn't the move towards v2 API supposed to fix this? Or is it a different problem?
See in following workspace / method: rjxmicrobiome/RGNAVUS_Map_Gene_Abundance
Damian
Comments
7 comments
Hi Damian,
I see in the workspace that you seem to have gotten a successful run after two failures. The error in the failed submission defines PAPI error code 10 as "The assigned worker has failed to complete the operation". This is an error that is known in PAPIv2 and is different from the one we were previously experiencing with v1. We are following the issue with Google and hope to have resolution. Retrying the submission tends to overcome the original failure.
The error code 10 sometimes occurs in this bug because PAPI is unable to detect a preemption and sometimes the error code 10 occurs for reasons that seem to be unknown. Because our internal system, Cromwell, has no way to know which of the two above reasons is the failure, it does not re-try. To workaround, you can set maxRetries in your runtime attribute block to retry failures in cases such as this.
Hope this helps - let us know if you have further questions and we will keep you updated on the status of these bugs.
Hi Sushma,
I'm having this same error. Though mine seems pretty persistent. I don't think I've ever had it succeed after retrying, but it works sometimes on other datasets, and if I process it outside of Terra. I tried the maxRetries setting you suggested. It just makes it run more times unsuccessfully. As you mention, this is a workaround to just try more times, but since it seems like I get this error no matter how many times I run it, this isn't a solution. It also seems to be the case, at least for my workflow, that it runs to completion (based on stdout and print statements we've included) every time, so it seems to be something in particular that's triggering it.
I'm happy to try to dig into this further with Terra Support. At the moment it's completely halting my processing of the particular pipeline it's affecting.
Hi Brian,
Thanks for the information. I will talk to the team and see if we can get you more information/help. Would you be able to share your workspace as well? You can share it with GROUP_FireCloud-Support@firecloud.org (as a Writer which can be removed later) and just the name of your workspace? If you would like to do this in a non public forum you can email the information to terra-support@broadinstitute.zendesk.com (but share the workspace with the above FireCloud-Support email).
HI Sushma,
We are experiencing that issue continuously - is there any updade on what causes PAPI error code 10?
See another workflow:
Hi Sushma,
I have also been experiencing persistent PAPI error code 10. Tried both running the workflow multiple times (2 tries) and setting maxRetries to 3, without luck.
I have been trying to run the gatk/mutect2-gatk4 on WGS bams. The workflow fails in a scatter step, where most of the scatter tasks will run correctly but one or two will fail.
Has any solution been found?
Thanks,
- Aina
Hi All,
Thank you for posting all your information! I am talking with members of our team to try and diagnose what is happening and what suggestions we can offer. I am going to update this thread with more information as I receive it.
Thank you for your patience!
Hello Everyone,
Here is a post from the team to describe the PAPI error code 10 issue. The post also offers some short term workarounds!
Please sign in to leave a comment.