GATK $5 genome pipeline fails because no suitable codecs found in dbSNP VCF index

Post author
Cuca Dogo

Hi,

 

I am trying to run and benchmark the GATK five-dollar-genome pipeline.

 

I ran the pipeline with the same parameters as this JSON file within the GitHub repo. However, the pipeline failed on the BaseRecalibrator tasks. I checked the log file for these tasks and it contained the following:

A USER ERROR has occurred: Cannot read /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx because no suitable codecs found

 

The index file being used is gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx and so shouldn't be causing any issues. Also, I have shared this workspace with GROUP_FireCloud-Support@firecloud.org.

Any ideas what the issue is and how it can be resolved?

 

Thanks in advance

Comments

3 comments

  • Comment author
    Tiffany Miller

    Hello Cuca Dogo

    What is the name of the workspace and the submission Id?

    0
  • Comment author
    Tiffany Miller
    • Edited

    There may be an issue with your input vcf file that BQSR doesn't like as discussed here

    Updated: This broad bucket path will soon not be available for free. Can you update the config to use these paths and run again?

    dbsnp_vcf": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf

    dbsnp_vcf_index": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx

    0
  • Comment author
    Allie Hajian

    Note that reference data files are no longer located in the "gs://broad-references/" bucket but have been moved to gs://gcp-public-data--broad-references, hosted by Google. 

     

    0

Please sign in to leave a comment.