Articles in this section
- How to report an issue
- Is the Terra platform down?
- Billing: Error creating billing project - "The caller does not have permission"
- Billing: Expired Google Billing Account trial
- Billing: Google Billing Account isn't showing up or has an error message
- Billing: Google Billing Account does not appear in the Billing Management screen
- Billing: "resource has been exhausted"
- Billing: Total run cost "not available"
- Data Model: How to process readgroup-level files
- Error: Notebook cloud environment runtime not starting due to insufficient disk space
Error message: File is larger than 10000000 Bytes

- Updated
What this looks like
You’ve likely used one of the read_X functions in your WDL and have surpassed the default limits set in Terra’s Cromwell instance. In practice when Cromwell starts reading a file and surpasses the size limit, Cromwell will immediately stop downloading and fail the workflow giving you this error message. If you want to know why these limits were introduced, read this blog post (link coming soon).
Limits
* read_lines: = 10MB
* read_json = 10MB
* read_tsv = 10MB
* read_object = 10MB
* read_boolean = 7 bytes
* read_int = 19 bytes
* read_float = 50 bytes
* read_string = 128KB
* read_map = 128KB
Workarounds
In the case where you are using read_lines() with a large file of filenames and are getting an error, the best workaround will be to split the large file by line count into multiple small files, scatter over the array of small files, and grab the filename by reading contents of each small file. This same concept can be applied to other read_X errors.
Here are two example WDLs for inspiration:
Option 1
workflow w {
File fileOfFilenames # 1GB in size
#Split large file into small individual files
call splitFile { input: largeFile = fileOfFilenames }
scatter (f in splitFile.tiny_files) {
String fileName = read_string(f)
}
Array[String] filenames = fileName
}
task splitFile {
File largeFile
command {
mkdir sandbox
split -l 1 ${largeFile} sandbox/
}
output {
Array[File] tiny_files = glob("sandbox/*")
}
runtime {
docker: "ubuntu:latest"
}
}
Option 2workflow use_file_of_filenames {
File file_of_filenames
call count_filenames_in_file { input: file_of_filenames = file_of_filenames }
scatter (index in range(count_filenames_in_file.count)) {
call operate_on_file { input: file_of_filenames = file_of_filenames, file_index = index }
}
}
task count_filenames_in_file {
File file_of_filenames
command {
wc -l < ${file_of_filenames}
}
output {
Int count = read_int(stdout())
}
}
task operate_on_file {
File file_of_filenames
Int file_index
command {
# 1: Get the appropriate file name from the list
# 2: Operate on that file as a URL
}
}
Comments
0 comments
Please sign in to leave a comment.