Filter pre-emptions out of job errors in Job Manager

Post author
Brad Taylor

Hi,

I notice that my list of errors in Job Manager includes pre-emption events. For example, if I have a job where only 1 shard fails, I might see 11 errors. 10 of these are messages indicating a job was pre-empted and would be retried. That seems like expected behavior, and it's obscuring my 1 error message.

I have other jobs that are entirely successful, for which I still see a long list of errors that are entirely pre-emption messages.

Can we handle these differently? I imagine I would want to know how many pre-emption events I saw for each task, but as a count rather than an error.

Thanks!
Brad

Comments

1 comment

  • Comment author
    Ruchi Munshi
    • Edited

    This is totally a bug and not expected behavior.

    1. Let's filter out the jobs that have been retried as a part of the "errors" list.

    2. Find a way to display how many times a job has been retried in a different manner that doesn't conflate with "errors".

     

    Slated to focus on in June/July.

    0

Please sign in to leave a comment.