Size limits on data tables?
We are trying to create very large data tables in a workspace to track simulated genotype data. We are using the Bioconductor::AnVIL R functions avtable_import and avtable_import_set to write to the workspace data tables. Initially we got errors that importing >100,000 rows at a time was not allowed, so I changed our code to iterate in batches of 100,000 when trying to import to a sample table with >100,000 rows. That helped, but we are still getting errors.
This is what we've tried:
- Total 150000 samples (30000 samples per group): everything works well, all tables were imported successfully and all the 5 jobs reported success.
- Total 300000 samples (60000 samples per group): all 5 jobs reported failure but all the tables were still imported successfully.
- Total 600000 samples (120000 samples per group): all 5 jobs reported failure, the tables were still imported successfully except sample_set table. The sample_set table were only imported for the first 2 jobs but failed to be imported for the last 3 jobs.
In the 5 jobs of (2) and the first 2 jobs of (3), the error messages recorded in results.log are
<html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><title>502 Server Error</title></head><body text=#000000 bgcolor=#ffffff><h1>Error: Server Error</h1><h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2><h2></h2></body></html>Error: 'avtable_import_sample_set' failed: Bad Gateway (HTTP 502).Execution halted
In the last 3 jobs of (3), the error messages were:
Error in curl::curl_fetch_memory(url, handle = handle) :
HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)
Calls: anvil_import_tables ... request_fetch -> request_fetch.write_memory -> <Anonymous>
Execution halted
Comments
1 comment
Hi Stephanie,
Unfortunately, the entity service does have limitations when it comes to larger data sets. I double-checked with our engineers if the issue you are seeing is a bug that can be resolved, but they confirmed it is one of these system limitations. There is some work in their backlog to help make some improvements in the future, and our engineers are working with an owner of the Anvil package you are using to help optimize the R package functions. The best solution, for now, is to work in smaller batches of samples.
Please let me know if you have any questions!
Best,
Emily
Please sign in to leave a comment.