Reference Disks in Terra

Anton Kovalsky
  • Updated

Terra now includes the ability to automatically attach a disk containing HG 19/HG 38 references to your Google Virtual Machine, saving time otherwise spent on potentially slow localization of large reference inputs. If the checkbox labeled ‘Use Reference Disks’ is selected, the execution engine will examine the inputs to a job to see if any of them correspond to reference inputs available on a reference disk image.

2021-02-02_0903.png

If this is the case, the engine will arrange for an additional disk to be mounted onto the VM using the reference disk image containing those inputs, allowing it to skip the delocalization process for the reference inputs. Currently two reference images are supported in Terra:

Reference Image Manifests

HG 19 public image

2020-10-26 (10 GiB)

gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.fai

gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.dict

gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.haplotype_database.txt

gs://gcp-public-data--broad-references/hg19/v0/dbsnp_138.b37.vcf.gz.tbi

gs://gcp-public-data--broad-references/hg19/v0/dbsnp_138.b37.vcf.gz

gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta

HG 38 public images

We now have reference files for human genome assembly 38, release 27 for single cell pipelines., including hg38-sc-v27-mod which is a reference that has both intron and exon annotations. You can learn more about the tool that marks intronic regions here.

2020-10-26 (10 GiB)

gs://gcp-public-data--broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz

gs://gcp-public-data--broad-references/hg38/v0/exome_evaluation_regions.v1.interval_list

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai

gs://gcp-public-data--broad-references/hg38/v0/exome_calling_regions.v1.interval_list

gs://gcp-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.UD

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz

gs://gcp-public-data--broad-references/hg38/v0/wgs_evaluation_regions.hg38.interval_list

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi

gs://gcp-public-data--broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt

gs://gcp-public-data--broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac

gs://gcp-public-data--broad-references/hg38/v0/wgs_coverage_regions.hg38.interval_list

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa

gs://gcp-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.mu

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta

gs://gcp-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.bed

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf

hg38-sc-v27

gs://gcp-public-data--broad-references/hg38/v0/gencode.v27.primary_assembly.annotation.gtf

gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa

gs://gcp-public-data--broad-references/hg38/v0/star/star_2.7.9a_primary_gencode_human_v27.tar

hg38-sc-v27-mod

gs://gcp-public-data--broad-references/hg38/v0/single_nucleus/modified_gencode.v27.primary_assembly.annotation.gtf

gs://gcp-public-data--broad-references/hg38/v0/single_nucleus/modified_GRCh38.primary_assembly.genome.fa

gs://gcp-public-data--broad-references/hg38/v0/single_nucleus/star/modified_star_2.7.9a_primary_gencode_human_v27.tar

 

M21 and M23 public images

The following M21 and M23 mouse reference files are used as inputs to WARP single-cell pipelines. See this document for more information. For more information regarding the use of the references, please see the overviews of Optimus and Smartseq2.

MM-10-sc

gs://gcp-public-data--broad-references/mm10/v0/GRCm38.primary_assembly.genome.fa

mm10-sc-vm21

gs://gcp-public-data--broad-references/mm10/v0/gencode.vM21.primary_assembly.annotation.gtf

gs://gcp-public-data--broad-references/mm10/v0/star/star_2.7.9a_primary_gencode_mouse_vM21.tar

gs://gcp-public-data--broad-references/mm10/v0/gencode.vM23.primary_assembly.annotation.gtf

gs://gcp-public-data--broad-references/mm10/v0/star/star_2.7.9a_primary_gencode_mouse_vM23.tar

mm10-sc-vm23-mod

gs://gcp-public-data--broad-references/mm10/v0/single_nucleus/modified_gencode.vM23.primary_assembly.annotation.gtf

gs://gcp-public-data--broad-references/mm10/v0/single_nucleus/modified_mm10.primary_assembly.genome.fa

gs://gcp-public-data--broad-references/mm10/v0/single_nucleus/star/modified_star_2.7.9a_primary_gencode_mouse_vM23.tar

While the reference disk feature should reduce the total time that the VM is running, it does increase the hourly rate of the VM since an additional disk is mounted to hold the reference image. For this reason ‘Use Reference Disks’ is turned off by default. Users should decide for themselves if it makes sense to trade off the decrease in time the VM is running for the increase in the VM’s hourly rate. If a job would spend a large proportion of its time localizing reference files relative to the time spent computing, the use of reference disks should be worthwhile.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.