I am attempting to run PCA on a large dataset using Hail. I am using the Terra Hail environment. I have an "Increased Computing Power" master node, and 24-odd worker nodes with 15GB memory.
The error I am getting is:
Hail version: 0.2.39-ef87446bd1c7 Error summary: SparkException: Job aborted due to stage failure: Task 13 in stage 32.0 failed 4 times, most recent failure: Lost task 13.3 in stage 32.0 (TID 1081, saturn-62c1ecef-284b-4593-96ed-63bb90133cc2-w-4.c.ariel-research-and-development.internal, executor 70): ExecutorLostFailure (executor 70 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 6.0 GB of 6 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714. Driver stacktrace:
Per Hail's forums (https://discuss.hail.is/t/pca-failed-due-to-not-enough-executor-memory/285) I read the following advice:
"The YARN defaults aren’t super great for a lot of the memory-intensive linear algebra routines we use. One way to make things a bit better is to increase the number of cores per executor from 1 to 4, so that all the overhead per JVM drops by a factor of 4. You can do this by setting
spark.executor.cores=4 in the spark configuration."
Is there a way to make changes to the spark configuration via the Notebook runtime setup? I am about to try upping my worker node memory to 265GB but I am not sure that will accomplish the same thing as upping the # of executor cores as per the advice above.
I also saw someone with a similar issue in the Terra forums, and it was suggested a custom container could be made in Terra. Is there any documentation that explains how to do so?
Thanks for reading!
Please sign in to leave a comment.