Accessing GTEx data when running workflow
I have access to GTEx version 8 data on google cloud. When I'm running the haplotypecaller-gvcf-gatk4 workflow, I got an error that says: Bucket is requester pays bucket but no user project provided. How to I specify this?
In my Tables, I included the following:
gs://bucket_name/GTEx_Analysis_2017-06-05_v8_WES_BAM_files/bam_file.
I know that I'm missing the part to specify the user project but I'm not sure how to do this.
Thank you for the help.
Comments
15 comments
I am having the exact same problem. It appears that I can download directly from the Google Cloud Platform console, but using the command line utility gsutil, I get the following:
I set up my credentials with gcloud auth list, and I have my account set to the free AnVIL project. I also had pass_credentials_to_gsutil = True.
Thanks for your assistance
Hello Tanya and Q,
When using the command line, can you try passing in --gcs-project-for-requester-pays with the billing project to be billed to see if that works?
I will get back to you about options for when using the Workflow in Terra.
Sushma
Hi Sushma,
Thanks for your response. I think the option --gcs-project-for-requester-pays is only an option for gatk? I am using the gsutil command directly as specified on the "File Details" of a bam file on the AnVIL_GTEx_V8_hg38 data page.
Not sure if this is related, but I see this outstanding issue on the gatk github page that references the same error? https://github.com/broadinstitute/gatk/issues/6179
Thanks for your help.
Hi Sushma,
Thank you so much for looking into this. Any updates on how I can access GTEx data when using the workflow in Terra?
Thank you!
Tanya
Q DI,
When you run your gsutil cp command can you try adding the -u parameter followed by your google billing project as follows:
Passing in your billing project with the -u parameter should allow you to download files with gsutil when the bucket has requester-pays enabled.
Tanya,
Apologies for the delay! I spoke with some of our WDL developers and they said that the way to circumvent this is to manually modify the tasks in the WDL that are localizing files from the requester-pays bucket. You would need to pass in the --gcs-project-for-requester-pays parameter so that the command knows to accept the billing-project you want to bill for accessing files in the requester-pays bucket.
I am currently working to modify one of the tasks in the haplotypecaller WDL for you to test out! I will be in touch shortly!
Hi Sushma,
Thank you so much for the help.
Best,
Tanya
Hi Sushma,
Thanks much, the -u parameter was exactly what I needed!
Just a suggestion, it may be good to indicate the existence of that parameter here: https://support.terra.bio/hc/en-us/articles/360029251091-Broad-Genomics-Downloading-data-from-a-Terra-workspace#gsutildownload
Unfortunately, none of the Google gsutil documentation mentions this parameter (or at least if they do, I have not been able to find it :) )
Thanks again!
Q DI,
I will let our team know to add this information! For reference on this thread, here is the Google documentation on the -u parameter: https://cloud.google.com/storage/docs/using-requester-pays#using
Sushma
Thanks for the Google documentation link!
Hi Sushma,
I just wanted to check in to see if there is a test version of the Haplotype caller WDL to allow me to specify the billing project. Thank you so much for your help.
Best,
Tanya
Hi Sushma,
I encountered the exact same problem trying to access GTEx data from a Terra workflow. You suggested "modify the tasks in the WDL that are localizing files from the requester-pays bucket. You would need to pass in the --gcs-project-for-requester-pays parameter so that the command knows to accept the billing-project you want to bill for accessing files in the requester-pays bucket." Could you please give an example of a WDL fragment that uses a billing project?
Here is the fragment of my WDL that is failing (the input_bam and input_bai parameters refer to files in the Requester Pays bucket):
Thanks!
Sarah
Hi Sarah,
Here is the relevant information for the gatk PrintReads tool: https://gatk.broadinstitute.org/hc/en-us/articles/360037592891-PrintReads
So your command block should look something like:
Here is another example:
I hope this helps!
Kind regards,
Jason
What does one do when not using GATK and needs to access these GTEx data in a requester pays bucket?
Hi Joe Brown,
If you are accessing data from a requester pays bucket using a workflow in Terra, Cromwell (the workflow management system) will automatically bill your billing project for the access to the resources. You can read a little more about this here: https://cromwell.readthedocs.io/en/stable/filesystems/GoogleCloudStorage/#requester-pays
In cases where one uses a localization-optional option with GATK, they would need to specify their project using the aforementioned flag.
Kind regards,
Jason
Please sign in to leave a comment.