Localization of reference files
I'm trying to use different reference genomes (not provided in Terra's reference data), and wondering whether they need to be saved in a same folder.
I was using .dict and .fai files saved in different buckets for my tool. I thought 'localizing input' step of cromwell will put all reference files in VM's memory, so I don't need to save them in a same bucket. But this didn't work. Could someone confirm how this actually works? Thanks!
Comments
6 comments
Hello,
You can use files that exist in an external Google bucket - they do not have to be copied into the same Workspace Google bucket. You need to make sure that the external bucket has your Terra proxy group added to its permissions.
1. Go to the storage browser for your Google project.
2. Select Permissions from the menu on the top of the bucket page. It should be a tab between Bucket Lock and Overview.
3. Click Add members and add the Proxy group. This step allows Terra to access an external bucket.
You can find your Proxy group from the Profile tab in the hamburger menu of Terra. If you click on Profile, you should see proxy group email that you copy paste into the Add members box in #3. This should allow you to use reference files from this bucket in your Tools.
You may also need to make the reference files public.
If this are steps you have already taken please let us know and we can help troubleshoot further.
I double-checked whether the permission was an issue, but it doesn't seem like. I tested on a tool that worked when I used related files (e.g. index file) stored in a same bucket. I copied the comsic index file to a different bucket I have an access to. Localizing all the inputs to /cromwell-root was successful, but cosmic file and it's index file were localized under a different sub-folder and the tool couldn't recognize it.
Here is the log file.
Sehyun,
The index file: gs://terra-cnvworkflow/CosmicCodingMuts.vcf.gz.tbi needs to be in the same folder as the actual VCF file: gs://cnvworkflow_files_hg19/CosmicCodingMuts.vcf.gz.
They are localized to different Cromwell directories because they live in different gs:// buckets. If you were to move the .tbi index file from terra-cnvworkflow bucket to the cnv_workflow_files_hg19 bucket or the opposite, this should run properly.
Hi Sushma,
I made this tool worked by putting the index file in the same bucket, and actually that was my initial question: the related reference files (.dict, .idx etc.) need to be stored under the same bucket even if they are localized to /cromwell_root? It seems like the answer is 'yes'. :)
- Sehyun
Yes, sorry for missing that question! Yes, I think that moving all index files to the same bucket *should* work!
Note that reference data files are no longer located in the "gs://broad-references/" bucket but have been moved to gs://gcp-public-data--broad-references, hosted by Google.
Please sign in to leave a comment.