Cromwell Retry with More Memory feature false failures

Post author
Chris Whelan

I’m working on a WDL task that runs something with very verbose logging. The job appears to run successfully — the script return code is 0 and the outputs are delocalized — but then Cromwell seems to be marking it as failed because the stderr contains the string OutOfMemoryError (I think because it logs a java command with an -XX:OnOutOfMemoryError parameter). This shows up in the metadata:

 
"failures": [
{
"causedBy": [],
"message": "stderr for job HailMerge.HailMerge:NA:1 contained one of the `memory-retry` error-keys specified in the config. Job might have run out of memory."
}
]

Is there any per-workflow or per-task way to disable this log parsing for the memory retry feature, which I’m not using?

Comments

5 comments

  • Comment author
    Jason Cerrato

    Hey Chris,

    Thanks for writing in. Can you confirm whether this is a general Cromwell question or if you are looking for a way to configure this in Terra?

    Many thanks,

    Jason

    0
  • Comment author
    Chris Whelan

    Hi Jason,

    This is a general Cromwell question, although I guess it would apply to Terra now that the retry-with-more-memory is available for workflows there.

    Chris

    0
  • Comment author
    Jason Cerrato

    Hi Chris,

    Thank you for clarifying. Our Batch team has filed a ticket to make sure that memory retry only triggers in cases where the task was otherwise about to fail, to ensure that it doesn't autoretry if the job is actually successful. If you're interested, you can keep track of that ticket here: https://broadworkbench.atlassian.net/browse/BW-760

    We believe this should solve your issue if the main concern is Cromwell retrying on a false failure. If you are otherwise interested in disabling the memory retry feature in general, you can remove the error keys that can be found in Cromwell conf at system.memory-retry-error-keys. Unfortunately there isn't a way to disable this at a workflow or task level.

    I hope this helps! If you have any other questions, please let us know.

    Kind regards,

    Jason

    0
  • Comment author
    Chris Whelan

    Hi Jason,

    Thanks! Just to clarify, in my case I didn't have `maxRetries` set in my task runtime definition, so Cromwell didn't attempt an autoretry. However, Cromwell did mark the task, and the workflow, as having failed -- that's the behavior that I found perplexing.

    In this case I figured out how to lower the logging level of the tool I was using in my task command, so I was able to work around this problem.

    Chris

    0
  • Comment author
    Jason Cerrato

    Hi Chris,

    Glad to hear! Thank you for letting us know. If we can help with anything else, please don't hesitate to reach out.

    Kind regards,

    Jason

    0

Please sign in to leave a comment.