Filter pre-emptions out of job errors in Job Manager
Hi,
I notice that my list of errors in Job Manager includes pre-emption events. For example, if I have a job where only 1 shard fails, I might see 11 errors. 10 of these are messages indicating a job was pre-empted and would be retried. That seems like expected behavior, and it's obscuring my 1 error message.
I have other jobs that are entirely successful, for which I still see a long list of errors that are entirely pre-emption messages.
Can we handle these differently? I imagine I would want to know how many pre-emption events I saw for each task, but as a count rather than an error.
Thanks!
Brad
Comments
1 comment
This is totally a bug and not expected behavior.
1. Let's filter out the jobs that have been retried as a part of the "errors" list.
2. Find a way to display how many times a job has been retried in a different manner that doesn't conflate with "errors".
Slated to focus on in June/July.
Please sign in to leave a comment.