Error in GATK-SV terra pipeline, 07-FilterBatchSites step

Post author
Dong Wang

Hello, I am currently trying to do a pilot run using the GATK-SV pipeline in cohort mode on terra. I have experience with other GATK pipelines, but first time using terra. I'm advancing through the pipeline using pre-configured settings and inputs, but encountering an error at step 07-FilterBatchSites. 

The error message is: 

Adjudicating BAF (1)...
Traceback (most recent call last):
  File "/opt/conda/envs/gatk-sv/bin/svtk", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/opt/svtk/scripts/svtk", line 65, in <module>
    main()
  File "/opt/svtk/scripts/svtk", line 62, in main
    getattr(cli, command)(sys.argv[2:])
  File "/opt/svtk/svtk/cli/adjudicate.py", line 33, in main
    scores, cutoffs = adjudicate_SV(metrics)
  File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 342, in adjudicate_SV
    cutoffs[0] = adjudicate_BAF1(metrics)
  File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 67, in adjudicate_BAF1
    cutoffs = adjudicate_BAF(
  File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 34, in adjudicate_BAF
    del_cutoffs = rf_classify(metrics, trainable, testable, features,
  File "/opt/svtk/svtk/adjudicate/random_forest.py", line 19, in rf_classify
    rf = RandomForest(trainable, testable, features, cutoffs, labeler, name,
  File "/opt/svtk/svtk/adjudicate/random_forest.py", line 44, in __init__
    raise Exception('No clean variants found')
Exception: No clean variants found

Digging a little deeper, this is caused by the batch metrics file generated in the previous steps missing these two columns: `BAF_snp_ratio` and `BAF_del_loglik`. I think there are other columns as well, but these two directly caused this error. I'm not sure if this is a bug or because I'm inputting something wrong. I'm still getting used to terra and understanding the pipeline, so I appreciate any help, thanks!

Comments

2 comments

  • Comment author
    Josh Evans

    Hi Dong,

    Thank you for writing in about this issue. Can you share the workspace where you are seeing this issue with Terra Support by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.

    1. Toggle the "Share with support" button to "Yes"
    2. Click Save

    Please provide us with

    1. A link to your workspace
    2. The relevant submission ID
    3. The relevant workflow ID

    We’ll be happy to take a closer look as soon as we can!

    Kind regards,

    Josh

    0
  • Comment author
    Emma Pierce-Hoffman

    For future users who encounter this issue, this error was the result of running FilterBatchSites on fewer than 100 samples. The cohort mode of GATK-SV is designed for processing at least 100 samples at a time, so to avoid this error, make sure each of your batches contains at least 100 samples. If you have fewer samples, you might be interested in the single-sample version of GATK-SV.

    0

Please sign in to leave a comment.