GATK $5 genome pipeline fails because no suitable codecs found in dbSNP VCF index
I am trying to run and benchmark the GATK five-dollar-genome pipeline.
I ran the pipeline with the same parameters as this JSON file within the GitHub repo. However, the pipeline failed on the BaseRecalibrator tasks. I checked the log file for these tasks and it contained the following:
A USER ERROR has occurred: Cannot read /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx because no suitable codecs found
The index file being used is gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx and so shouldn't be causing any issues. Also, I have shared this workspace with GROUP_FireCloud-Support@firecloud.org.
Any ideas what the issue is and how it can be resolved?
Thanks in advance
Hello Cuca Dogo,
What is the name of the workspace and the submission Id?
There may be an issue with your input vcf file that BQSR doesn't like as discussed here.
Updated: This broad bucket path will soon not be available for free. Can you update the config to use these paths and run again?
Note that reference data files are no longer located in the "gs://broad-references/" bucket but have been moved to gs://gcp-public-data--broad-references, hosted by Google.
Please sign in to leave a comment.