GATK $5 genome pipeline fails because no suitable codecs found in dbSNP VCF index

Post author
Cuca Dogo



I am trying to run and benchmark the GATK five-dollar-genome pipeline.


I ran the pipeline with the same parameters as this JSON file within the GitHub repo. However, the pipeline failed on the BaseRecalibrator tasks. I checked the log file for these tasks and it contained the following:

A USER ERROR has occurred: Cannot read /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx because no suitable codecs found


The index file being used is gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx and so shouldn't be causing any issues. Also, I have shared this workspace with

Any ideas what the issue is and how it can be resolved?


Thanks in advance



  • Comment author
    Tiffany Miller

    Hello Cuca Dogo

    What is the name of the workspace and the submission Id?

  • Comment author
    Tiffany Miller
    • Edited

    There may be an issue with your input vcf file that BQSR doesn't like as discussed here

    Updated: This broad bucket path will soon not be available for free. Can you update the config to use these paths and run again?

    dbsnp_vcf": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf

    dbsnp_vcf_index": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx

  • Comment author
    Allie Hajian

    Note that reference data files are no longer located in the "gs://broad-references/" bucket but have been moved to gs://gcp-public-data--broad-references, hosted by Google. 



Please sign in to leave a comment.