GATK $5 genome pipeline fails because no suitable codecs found in dbSNP VCF index

December 18, 2019 09:57
3 comments

Hi,

I am trying to run and benchmark the GATK five-dollar-genome pipeline.

I ran the pipeline with the same parameters as this JSON file within the GitHub repo. However, the pipeline failed on the BaseRecalibrator tasks. I checked the log file for these tasks and it contained the following:

A USER ERROR has occurred: Cannot read /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx because no suitable codecs found

The index file being used is gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx and so shouldn't be causing any issues. Also, I have shared this workspace with GROUP_FireCloud-Support@firecloud.org.

Any ideas what the issue is and how it can be resolved?

Thanks in advance

Comments

3 comments

Tiffany Miller
- December 20, 2019 16:13
Hello Cuca Dogo,

What is the name of the workspace and the submission Id?

0
Tiffany Miller
- Edited December 20, 2019 16:48
There may be an issue with your input vcf file that BQSR doesn't like as discussed here.

Updated: This broad bucket path will soon not be available for free. Can you update the config to use these paths and run again?

dbsnp_vcf": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf

dbsnp_vcf_index": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx

0
Allie Hajian
- May 18, 2020 15:31
Note that reference data files are no longer located in the "gs://broad-references/" bucket but have been moved to gs://gcp-public-data--broad-references, hosted by Google.

0

Please sign in to leave a comment.