What this looks like
You’ve likely used one of the read_X functions in your WDL and have surpassed the default limits set in Terra’s Cromwell instance. In practice when Cromwell starts reading a file and surpasses the size limit, Cromwell will immediately stop downloading and fail the workflow giving you this error message.
Cromwell limits
- read_lines: = 10MB
- read_json = 10MB
- read_tsv = 10MB
- read_object = 10MB
- read_boolean = 7 bytes
- read_int = 19 bytes
- read_float = 50 bytes
- read_string = 128KB
- read_map = 128KB
Workarounds
In the case where you are using read_lines() with a large file of filenames and are getting an error, the best workaround will be to split the large file by line count into multiple small files, scatter over the array of small files, and grab the filename by reading contents of each small file. This same concept can be applied to other read_X errors.
Alternatively, you can pass these files in as workflow inputs individually or collected in a tar.
Here are two example WDLs for inspiration.
Option 1
workflow w { File fileOfFilenames # 1GB in size #Split large file into small individual files call splitFile { input: largeFile = fileOfFilenames } scatter (f in splitFile.tiny_files) { String fileName = read_string(f) } Array[String] filenames = fileName } task splitFile { File largeFile command { mkdir sandbox split -l 1 ${largeFile} sandbox/ } output { Array[File] tiny_files = glob("sandbox/*") } runtime { docker: "ubuntu:latest" } }