Localization of reference files

Post author
Sehyun Oh

I'm trying to use different reference genomes (not provided in Terra's reference data), and wondering whether they need to be saved in a same folder.

I was using .dict and .fai files saved in different buckets for my tool. I thought 'localizing input' step of cromwell will put all reference files in VM's memory, so I don't need to save them in a same bucket. But this didn't work. Could someone confirm how this actually works? Thanks!

Comments

6 comments

  • Comment author
    Sushma Chaluvadi
    • Edited

    Hello,

    You can use files that exist in an external Google bucket - they do not have to be copied into the same Workspace Google bucket. You need to make sure that the external bucket has your Terra proxy group added to its permissions. 

    1. Go to the storage browser for your Google project.
    2. Select Permissions from the menu on the top of the bucket page. It should be a tab between Bucket Lock and Overview.
    3. Click Add members and add the Proxy group. This step allows Terra to access an external bucket. 

    You can find your Proxy group from the Profile tab in the hamburger menu of Terra. If you click on Profile, you should see proxy group email that you copy paste into the Add members box in #3. This should allow you to use reference files from this bucket in your Tools.

    You may also need to make the reference files public.

    If this are steps you have already taken please let us know and we can help troubleshoot further. 

    0
  • Comment author
    Sehyun Oh
    • Edited

    I double-checked whether the permission was an issue, but it doesn't seem like. I tested on a tool that worked when I used related files (e.g. index file) stored in a same bucket. I copied the comsic index file to a different bucket I have an access to. Localizing all the inputs to /cromwell-root was successful, but cosmic file and it's index file were localized under a different sub-folder and the tool couldn't recognize it.

    Here is the log file. 

    2019/06/05 06:11:03 Starting container setup.
    2019/06/05 06:11:10 Done container setup.
    2019/06/05 06:11:15 Starting localization.
    2019/06/05 06:11:21 Localizing input gs://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.fai -> /cromwell_root/broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.fai
    2019/06/05 06:11:28 Localizing input gs://5aa919de-0aa0-43ec-9ec3-288481102b6d/tcga/OV/WGA_RepliG/WXS/BI/ILLUMINA/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09.bam.bai -> /cromwell_root/5aa919de-0aa0-43ec-9ec3-288481102b6d/tcga/OV/WGA_RepliG/WXS/BI/ILLUMINA/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09.bam.bai
    2019/06/05 06:11:36 Localizing input gs://broad-references/hg19/v0/Homo_sapiens_assembly19.dict -> /cromwell_root/broad-references/hg19/v0/Homo_sapiens_assembly19.dict
    2019/06/05 06:11:44 Localizing input gs://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta -> /cromwell_root/broad-references/hg19/v0/Homo_sapiens_assembly19.fasta
    2019/06/05 06:12:45 Localizing input gs://broad-references/hg19/v0/dbsnp_138.b37.vcf.gz -> /cromwell_root/broad-references/hg19/v0/dbsnp_138.b37.vcf.gz
    2019/06/05 06:13:16 Localizing input gs://terra-cnvworkflow/CosmicCodingMuts.vcf.gz.tbi -> /cromwell_root/terra-cnvworkflow/CosmicCodingMuts.vcf.gz.tbi
    2019/06/05 06:13:24 Localizing input gs://5aa919de-0aa0-43ec-9ec3-288481102b6d/tcga/OV/WGA_RepliG/WXS/BI/ILLUMINA/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09.bam -> /cromwell_root/5aa919de-0aa0-43ec-9ec3-288481102b6d/tcga/OV/WGA_RepliG/WXS/BI/ILLUMINA/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09.bam
    2019/06/05 06:18:51 Localizing input gs://cnvworkflow_files_hg19/CosmicCodingMuts.vcf.gz -> /cromwell_root/cnvworkflow_files_hg19/CosmicCodingMuts.vcf.gz
    2019/06/05 06:18:59 Localizing input gs://broad-references/hg19/v0/dbsnp_138.b37.vcf.gz.tbi -> /cromwell_root/broad-references/hg19/v0/dbsnp_138.b37.vcf.gz.tbi
    2019/06/05 06:19:07 Localizing input gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/script -> /cromwell_root/script
    2019/06/05 06:19:15 Done localization.
    2019/06/05 06:19:20 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint= jvivian/mutect@sha256:a0420e9f76d2614742ec285587757cb70ccdcfd8717cdbb40356ff8bf96e501d /bin/bash /cromwell_root/script
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.b4d755b3
    INFO  06:19:23,824 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  06:19:23,827 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-0-g72492bb, Compiled 2015/05/09 22:37:34 
    INFO  06:19:23,827 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  06:19:23,827 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  06:19:23,831 HelpFormatter - Program Args: --analysis_type MuTect -R /cromwell_root/broad-references/hg19/v0/Homo_sapiens_assembly19.fasta --artifact_detection_mode --dbsnp /cromwell_root/broad-references/hg19/v0/dbsnp_138.b37.vcf.gz --cosmic /cromwell_root/cnvworkflow_files_hg19/CosmicCodingMuts.vcf.gz -dt None -I:tumor /cromwell_root/5aa919de-0aa0-43ec-9ec3-288481102b6d/tcga/OV/WGA_RepliG/WXS/BI/ILLUMINA/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09.bam -o TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon_stats.txt -vcf TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf 
    INFO  06:19:23,836 HelpFormatter - Executing as root@9c02ff15ccf1 on Linux 4.14.119+ amd64; OpenJDK 64-Bit Server VM 1.7.0_75-b13. 
    INFO  06:19:23,837 HelpFormatter - Date/Time: 2019/06/05 06:19:23 
    INFO  06:19:23,837 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  06:19:23,837 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  06:19:23,954 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  06:19:24,111 GenomeAnalysisEngine - Downsampling Settings: No downsampling 
    INFO  06:19:24,119 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    WARNING: BAM index file /cromwell_root/5aa919de-0aa0-43ec-9ec3-288481102b6d/tcga/OV/WGA_RepliG/WXS/BI/ILLUMINA/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09.bam.bai is older than BAM /cromwell_root/5aa919de-0aa0-43ec-9ec3-288481102b6d/tcga/OV/WGA_RepliG/WXS/BI/ILLUMINA/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09.bam
    INFO  06:19:24,146 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 3.1-0-g72492bb): 
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: An index is required, but none found., for input source: /cromwell_root/cnvworkflow_files_hg19/CosmicCodingMuts.vcf.gz
    ##### ERROR ------------------------------------------------------------------------------------------
    2019/06/05 06:19:30 Starting delocalization.
    2019/06/05 06:19:36 Delocalizing output /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf -> gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf
    2019/06/05 06:19:37 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/ failed
    CommandException: No URLs matched: /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf
    2019/06/05 06:19:37 Waiting 5 seconds and retrying
    2019/06/05 06:19:43 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/ failed
    CommandException: No URLs matched: /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf
    2019/06/05 06:19:43 Waiting 5 seconds and retrying
    2019/06/05 06:19:49 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/ failed
    CommandException: No URLs matched: /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf
    2019/06/05 06:19:55 Delocalizing output /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf.idx -> gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf.idx
    2019/06/05 06:20:02 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf.idx gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/ failed
    CommandException: No URLs matched: /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf.idx
    2019/06/05 06:20:02 Waiting 5 seconds and retrying
    2019/06/05 06:20:08 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf.idx gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/ failed
    CommandException: No URLs matched: /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf.idx
    2019/06/05 06:20:08 Waiting 5 seconds and retrying
    2019/06/05 06:20:14 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf.idx gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/ failed
    CommandException: No URLs matched: /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon.vcf.idx
    2019/06/05 06:20:20 Delocalizing output /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon_stats.txt -> gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon_stats.txt
    2019/06/05 06:20:21 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon_stats.txt gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/ failed
    CommandException: No URLs matched: /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon_stats.txt
    2019/06/05 06:20:21 Waiting 5 seconds and retrying
    2019/06/05 06:20:27 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon_stats.txt gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/ failed
    CommandException: No URLs matched: /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon_stats.txt
    2019/06/05 06:20:27 Waiting 5 seconds and retrying
    2019/06/05 06:20:33 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon_stats.txt gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/ failed
    CommandException: No URLs matched: /cromwell_root/TCGA_MC3.TCGA-61-2613-11A-01W-1092-09_pon_stats.txt
    2019/06/05 06:20:39 Delocalizing output /cromwell_root/stdout -> gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/stdout
    2019/06/05 06:20:47 Delocalizing output /cromwell_root/stderr -> gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/stderr
    2019/06/05 06:20:54 Delocalizing output /cromwell_root/rc -> gs://fc-secure-10ae8bff-1728-48ec-b886-bd30bad080fb/9f69ca71-0bd4-4e4a-8fe7-1798aaaac9e6/M1_only/a9947657-4aa7-4ceb-8ff1-f4c5f263ff55/call-M1/rc
    0
  • Comment author
    Sushma Chaluvadi

    Sehyun,

    The index file: gs://terra-cnvworkflow/CosmicCodingMuts.vcf.gz.tbi needs to be in the same folder as the actual VCF file: gs://cnvworkflow_files_hg19/CosmicCodingMuts.vcf.gz.

    They are localized to different Cromwell directories because they live in different gs:// buckets. If you were to move the .tbi index file from terra-cnvworkflow bucket to the cnv_workflow_files_hg19 bucket or the opposite, this should run properly.

    0
  • Comment author
    Sehyun Oh

    Hi Sushma,

    I made this tool worked by putting the index file in the same bucket, and actually that was my initial question: the related reference files (.dict, .idx etc.) need to be stored under the same bucket even if they are localized to /cromwell_root? It seems like the answer is 'yes'. :)

    - Sehyun 

    0
  • Comment author
    Sushma Chaluvadi

    Yes, sorry for missing that question! Yes, I think that moving all index files to the same bucket *should* work! 

    0
  • Comment author
    Allie Hajian

    Note that reference data files are no longer located in the "gs://broad-references/" bucket but have been moved to gs://gcp-public-data--broad-references, hosted by Google. 

    0

Please sign in to leave a comment.