Error in GATK-SV terra pipeline, 07-FilterBatchSites step
Hello, I am currently trying to do a pilot run using the GATK-SV pipeline in cohort mode on terra. I have experience with other GATK pipelines, but first time using terra. I'm advancing through the pipeline using pre-configured settings and inputs, but encountering an error at step 07-FilterBatchSites.
The error message is:
Adjudicating BAF (1)... Traceback (most recent call last): File "/opt/conda/envs/gatk-sv/bin/svtk", line 7, in <module> exec(compile(f.read(), __file__, 'exec')) File "/opt/svtk/scripts/svtk", line 65, in <module> main() File "/opt/svtk/scripts/svtk", line 62, in main getattr(cli, command)(sys.argv[2:]) File "/opt/svtk/svtk/cli/adjudicate.py", line 33, in main scores, cutoffs = adjudicate_SV(metrics) File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 342, in adjudicate_SV cutoffs[0] = adjudicate_BAF1(metrics) File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 67, in adjudicate_BAF1 cutoffs = adjudicate_BAF( File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 34, in adjudicate_BAF del_cutoffs = rf_classify(metrics, trainable, testable, features, File "/opt/svtk/svtk/adjudicate/random_forest.py", line 19, in rf_classify rf = RandomForest(trainable, testable, features, cutoffs, labeler, name, File "/opt/svtk/svtk/adjudicate/random_forest.py", line 44, in __init__ raise Exception('No clean variants found') Exception: No clean variants found
Digging a little deeper, this is caused by the batch metrics file generated in the previous steps missing these two columns: `BAF_snp_ratio` and `BAF_del_loglik`. I think there are other columns as well, but these two directly caused this error. I'm not sure if this is a bug or because I'm inputting something wrong. I'm still getting used to terra and understanding the pipeline, so I appreciate any help, thanks!
Comments
2 comments
Hi Dong,
Thank you for writing in about this issue. Can you share the workspace where you are seeing this issue with Terra Support by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.
Please provide us with
We’ll be happy to take a closer look as soon as we can!
Kind regards,
Josh
For future users who encounter this issue, this error was the result of running FilterBatchSites on fewer than 100 samples. The cohort mode of GATK-SV is designed for processing at least 100 samples at a time, so to avoid this error, make sure each of your batches contains at least 100 samples. If you have fewer samples, you might be interested in the single-sample version of GATK-SV.
Please sign in to leave a comment.