To save time localizing large reference inputs, Terra can automatically attach a disk containing reference genomes to your Google Virtual Machine. If the checkbox labeled ‘Use Reference Disks’ is selected, the execution engine will examine the job inputs to see if any of them correspond to reference inputs available on a reference disk image.
What happens when selecting the reference disk option
If this is the case, the engine will mount an additional disk containing those inputs onto the VM so it can skip the delocalization process for the reference inputs. See the reference image manifests below.
Reference Image Manifests
Human References
HG 19 public image
- gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.fai
- gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.dict
- gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.haplotype_database.txt
- gs://gcp-public-data--broad-references/hg19/v0/dbsnp_138.b37.vcf.gz.tbi
- gs://gcp-public-data--broad-references/hg19/v0/dbsnp_138.b37.vcf.gz
- gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta
HG 38 public images
We now have reference files for human genome assembly 38, release 27 for single cell pipelines., including hg38-sc-v27-mod which is a reference that has both intron and exon annotations. You can learn more about the tool that marks intronic regions here.
- gs://gcp-public-data--broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
- gs://gcp-public-data--broad-references/hg38/v0/exome_evaluation_regions.v1.interval_list
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai
- gs://gcp-public-data--broad-references/hg38/v0/exome_calling_regions.v1.interval_list
- gs://gcp-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.UD
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz
- gs://gcp-public-data--broad-references/hg38/v0/wgs_evaluation_regions.hg38.interval_list
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi
- gs://gcp-public-data--broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt
- gs://gcp-public-data--broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac
- gs://gcp-public-data--broad-references/hg38/v0/wgs_coverage_regions.hg38.interval_list
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa
- gs://gcp-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.mu
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta
- gs://gcp-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.bed
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict
- gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
HG38-sc-v27 Public Image
- gs://gcp-public-data--broad-references/hg38/v0/gencode.v27.primary_assembly.annotation.gtf
- gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa
- gs://gcp-public-data--broad-references/hg38/v0/star/star_2.7.9a_primary_gencode_human_v27.tar
HG38-sc-v27-mod Public Image
- gs://gcp-public-data--broad-references/hg38/v0/single_nucleus/modified_gencode.v27.primary_assembly.annotation.gtf
- gs://gcp-public-data--broad-references/hg38/v0/single_nucleus/modified_GRCh38.primary_assembly.genome.fa
- gs://gcp-public-data--broad-references/hg38/v0/single_nucleus/star/modified_star_2.7.9a_primary_gencode_human_v27.tar
Mouse References
The following M21 and M23 mouse reference files are used as inputs to WARP single-cell pipelines. See this document for more information. For more information regarding the use of the references, please see the overviews of Optimus and Smartseq2.
MM10 Public Image
- gs://gcp-public-data--broad-references/mm10/v0/single_nucleus/modified_mm10.primary_assembly.genome.fa
- gs://gcp-public-data--broad-references/mm10/v0/GRCm38.primary_assembly.genome.fa
M21 public image
- gs://gcp-public-data--broad-references/mm10/v0/gencode.vM21.primary_assembly.annotation.gtf
- gs://gcp-public-data--broad-references/mm10/v0/star/star_2.7.9a_primary_gencode_mouse_vM21.tar
M23 public image
- gs://gcp-public-data--broad-references/mm10/v0/gencode.vM23.primary_assembly.annotation.gtf
- gs://gcp-public-data--broad-references/mm10/v0/star/star_2.7.9a_primary_gencode_mouse_vM23.tar
- gs://gcp-public-data--broad-references/mm10/v0/single_nucleus/modified_gencode.vM23.primary_assembly.annotation.gtf
- gs://gcp-public-data--broad-references/mm10/v0/single_nucleus/star/modified_star_2.7.9a_primary_gencode_mouse_vM23.tar
Cost tradeoffs when using reference disks
While the reference disk feature should reduce the total time the VM is running, it does increase the hourly rate of the VM since an additional disk is mounted to hold the reference image. For this reason ‘Use Reference Disks’ is turned off by default.
Users should decide for themselves if it makes sense to trade-off the decrease in time the VM is running for the increase in the VM’s hourly rate. If a job would spend a large proportion of its time localizing reference files relative to the time spent computing, the use of reference disks should be worthwhile.