Automatically retrying workflows with more memory

Beth Sheets
  • Updated

What we're solving 

When workflow authors develop their WDLs, they write logic to calculate memory requirements for their tasks, which cover most of their use cases. However, edge cases occur, and it is convenient for the workflow executor (in this case, Cromwell) to parse out-of-memory errors and automatically retry the workflow with more memory. This saves users time looking through failed submissions and manually restarting workflows. 

Through user feedback, we identified that the original implementation of this feature only succeeded under certain scenarios and unnecessarily cost some users money. We decided to roll back the prior implementation of this feature into a preview state. This allows users to test if this feature works for them before using it to scale their analysis. It also allows us to learn under what scenarios this implementation does not work.

What's changing for you

To use retry with more memory with your workflow, you will need to go to https://app.terra.bio/#feature-preview, and click the checkbox for "Retry with more memory". 

This will enable your user interface to show the "retry with more memory" option when submitting workflows. Learn more about how to use this workflow submission option. 

Share feedback

If this preview feature doesn't work for you, please share feedback in the comments of this article or via support@terra.bio

More details about how this implementation works

The current implementation works for Java-based tools that hit the JVM memory allocation pool limit controlled by -Xmx and exit with a java.lang.OutOfMemoryError.

We have learned the current implementation does not work for tasks that are Out-Of-Memory-killed by the kernel. 

 

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.