Workflow auto-retry based on error message
Could we get the ability to automatically retry a failed workflow for certain specific failure messages in Terra? For example, if I see the string "Execution failed: generic::unknown: installing drivers: installing GPU drivers"
as the error message in Terra, I would like to automatically retry this job.
Comments
2 comments
Hi Stephen,
Thank you for writing in! I've sent this request to our development team for consideration, and I'll be happy to follow up with you if this feature gets built.
Best,
Anthony
I am still seeing a lot of this:
Task ... failed. The job was stopped before the command finished. PAPI error code 2. Execution failed: generic::unknown: installing drivers: installing GPU drivers and /proc/driver/nvidia does not exist: signal: terminated
Seems like it is becoming more and more common.
Please sign in to leave a comment.