Out of Memory Retry

Allie Hajian

When you exceed the size limit in Cromwell, the workflow may fail and you will receive an error message. Read on to find a way around it.

What this looks like

You’ve likely used one of the read_X functions in your WDL and have surpassed the default limits set in Terra’s Cromwell instance. In practice, when Cromwell starts reading a file and exceeds the size limit, Cromwell will immediately stop downloading and fail the workflow giving you this error message.

Limits
* read_lines: = 10MB
* read_json = 10MB
* read_tsv = 10MB
* read_object = 10MB
* read_boolean = 7 bytes
* read_int = 19 bytes
* read_float = 50 bytes
* read_string = 128KB
* read_map = 128KB

Workarounds

If you use read_lines() with a large file of filenames and get an error, the best workaround is to split the large file by line count into multiple small files, scatter over the array of small files, and grab the filename by reading contents of each small file. This same concept can be applied to other read_X errors.

Here are two example WDLs for inspiration:

Option 1

workflow w {
  File fileOfFilenames # 1GB in size

  #Split large file into small individual files
  call splitFile { input: largeFile = fileOfFilenames }

 scatter (f in splitFile.tiny_files) {
    String fileName = read_string(f)   
  }

  Array[String] filenames = fileName
}

task splitFile {
    File largeFile

    command {
        mkdir sandbox
        split -l 1 ${largeFile} sandbox/
    }

    output {
        Array[File] tiny_files = glob("sandbox/*")
    }
    runtime {
        docker: "ubuntu:latest"
    }
}

 Option 2

workflow use_file_of_filenames {
  File file_of_filenames
  call count_filenames_in_file { input: file_of_filenames = file_of_filenames }
  scatter (index in range(count_filenames_in_file.count)) {
    call operate_on_file { input: file_of_filenames = file_of_filenames, file_index = index }
  }
}

task count_filenames_in_file {
  File file_of_filenames
  command {
    wc -l < ${file_of_filenames}
  }
  output {
    Int count = read_int(stdout())
  }
}

task operate_on_file {
  File file_of_filenames
  Int file_index
  command {
    # 1: Get the appropriate file name from the list
    # 2: Operate on that file as a URL
  }
}
 
 Alternatively, you can pass these files in as workflow inputs individually or collected in a tar.

Was this article helpful?

1 out of 2 found this helpful

Have more questions? Submit a request

Comments

2 comments

  • Comment author
    Yossi Farjoun
    • Edited

    thanks for this guide! I was unable to find the various options that could be present in the variable MEM_UNIT. For example, if the units are GB, would the value be: 

    - "g"

    - "GB"

    -"Gb"

    -1000000000

    ?

    I see that in your example you use ${MEM_UNIT} as input to java's -mem argument, from which I deduce that it's `g`, (is that supposed to be -Xmx?) but it would be comforting to see the actual list somewhere.

    Thanks!

    0
  • Comment author
    Allie Cliffe

    Yossi Farjoun - Thanks for the feedback! I updated the docs (after consulting with Geraldine) to hopefully address your questions. 

    0

Please sign in to leave a comment.