Full USCS HG19 fasta and databases built on that reference
Hello Terra team,
For people bringing in data already aligned using the UCSC HG19 reference, it would be useful to have a publicly available copy of the corresponding FASTA (including index and dictionary) as it does not exactly match the GRCh37 reference FASTA.
Furthermore, all the databases needed for a step such as recalibration should be available with UCSC HG19 reference names and coordinates.
Here is a link to the HG19 references: https://console.cloud.google.com/storage/browser/gatk-legacy-bundles/hg19?project=broad-dsde-outreach&organizationId=548622027621
Excellent! This helps tremendously!
Will databases compatible with the UCSC hg19 reference also be made available soon? I'm thinking in particular of those necessary for the base quality score recalibration step:
dbSNP_vcf & known_indels_sites_VCFs (with corresponding indices)
I don't believe they are in a google bucket at this point but this FTP server seems to contain what you are looking for: ftp://firstname.lastname@example.org/bundle/hg19/
this has all been tremendously helpful, thank you!
We recently were emailed about references being migrated Google Cloud Genomics for cost reasons. Will that migration include the UCSC HG19 references you linked me to in your first reply?
I can make a note to the team working on the migration to include the hg19 resources as well. I can update you on the status of that decision once I hear back!
Please sign in to leave a comment.