Joint Genotyping on Terra issue with accessing external buckets for importGVCFs task in GATK. [error code 400]
Hi! I'm trying to use a small sample (~30) in the data collection 1000Genome HIgh coverage to pad my joint genotyping for a small cohort. I have imported the gs path. But I got the following error:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.dae86bf1 05:15:35.461 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 05:15:35.872 INFO GenomicsDBImport - ------------------------------------------------------------ 05:15:35.873 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.8.0 05:15:35.873 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/ 05:15:35.873 INFO GenomicsDBImport - Executing as root@7f7d5677e221 on Linux v5.15.65+ amd64 05:15:35.873 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 05:15:35.874 INFO GenomicsDBImport - Start Date/Time: October 11, 2022 5:15:35 AM GMT 05:15:35.874 INFO GenomicsDBImport - ------------------------------------------------------------ 05:15:35.874 INFO GenomicsDBImport - ------------------------------------------------------------ 05:15:35.875 INFO GenomicsDBImport - HTSJDK Version: 2.22.0 05:15:35.875 INFO GenomicsDBImport - Picard Version: 2.22.8 05:15:35.875 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2 05:15:35.875 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 05:15:35.875 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 05:15:35.876 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 05:15:35.876 INFO GenomicsDBImport - Deflater: IntelDeflater 05:15:35.876 INFO GenomicsDBImport - Inflater: IntelInflater 05:15:35.876 INFO GenomicsDBImport - GCS max retries/reopens: 20 05:15:35.876 INFO GenomicsDBImport - Requester pays: disabled 05:15:35.876 INFO GenomicsDBImport - Initializing engine 05:15:37.561 INFO FeatureManager - Using codec IntervalListCodec to read file file:///cromwell_root/fc-secure-4329a05b-7ebf-4931-86a6-1e3b2a0a9f0e/submissions/bae11f4f-a837-4299-9454-9b3ae6972b67/JointGenotyping/3154593f-8309-4f4c-aa1f-dacf92ce5479/call-SplitIntervalList/glob-d928cd0f5fb17b6bd5e635f48c18ccfb/0000-scattered.interval_list 05:15:37.788 INFO IntervalArgumentCollection - Processing 385747762 bp from intervals 05:15:37.799 INFO GenomicsDBImport - Done initializing engine 05:15:38.250 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905 05:15:38.251 INFO GenomicsDBImport - Vid Map JSON file will be written to /cromwell_root/genomicsdb/vidmap.json 05:15:38.251 INFO GenomicsDBImport - Callset Map JSON file will be written to /cromwell_root/genomicsdb/callset.json 05:15:38.251 INFO GenomicsDBImport - Complete VCF Header will be written to /cromwell_root/genomicsdb/vcfheader.vcf 05:15:38.251 INFO GenomicsDBImport - Importing to workspace - /cromwell_root/genomicsdb 05:15:38.251 WARN GenomicsDBImport - GenomicsDBImport cannot use multiple VCF reader threads for initialization when the number of intervals is greater than 1. Falling back to serial VCF reader initialization. 05:15:38.251 INFO ProgressMeter - Starting traversal 05:15:38.252 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute 05:15:51.646 INFO GenomicsDBImport - Shutting down engine [October 11, 2022 5:15:51 AM GMT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.27 minutes. Runtime.totalMemory()=8232370176 code: 400 message: Bucket is a requester pays bucket but no user project provided. reason: required location: null retryable: false com.google.cloud.storage.StorageException: Bucket is a requester pays bucket but no user project provided. at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:229) at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:439) at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:242) at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:239) at shaded.cloud_nio.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105) at shaded.cloud_nio.com.google.cloud.RetryHelper.run(RetryHelper.java:76) at shaded.cloud_nio.com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50) at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:238) at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.checkAccess(CloudStorageFileSystemProvider.java:736) at java.nio.file.Files.exists(Files.java:2385) at htsjdk.tribble.util.ParsingUtils.resourceExists(ParsingUtils.java:418) at htsjdk.tribble.AbstractFeatureReader.isTabix(AbstractFeatureReader.java:230) at htsjdk.tribble.AbstractFeatureReader$ComponentMethods.isTabix(AbstractFeatureReader.java:236) at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:114) at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:833) at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getFeatureReadersSerially(GenomicsDBImport.java:817) at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.createSampleToReaderMap(GenomicsDBImport.java:659) at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:699) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: shaded.cloud_nio.com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request { "code" : 400, "errors" : [ { "domain" : "global", "message" : "Bucket is a requester pays bucket but no user project provided.", "reason" : "required" } ], "message" : "Bucket is a requester pays bucket but no user project provided." } at shaded.cloud_nio.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150) at shaded.cloud_nio.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) at shaded.cloud_nio.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:451) at shaded.cloud_nio.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1089) at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:549) at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:482) at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:599) at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:436) ... 20 more Using GATK jar /gatk/gatk-package-4.1.8.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms8g -jar /gatk/gatk-package-4.1.8.0-local.jar GenomicsDBImport --genomicsdb-workspace-path genomicsdb --batch-size 50 -L /cromwell_root/fc-secure-4329a05b-7ebf-4931-86a6-1e3b2a0a9f0e/submissions/bae11f4f-a837-4299-9454-9b3ae6972b67/JointGenotyping/3154593f-8309-4f4c-aa1f-dacf92ce5479/call-SplitIntervalList/glob-d928cd0f5fb17b6bd5e635f48c18ccfb/0000-scattered.interval_list --sample-name-map /cromwell_root/fc-secure-4329a05b-7ebf-4931-86a6-1e3b2a0a9f0e/submissions/17a51933-9a4b-4405-b9f3-e8fcf4cc256c/GenerateSampleMap/f77d25e7-d0cd-4a9c-97e2-733512f8b33f/call-GenerateSampleMapFile/generate-sample-map_2022-10-11T04-48-34.sample_map --reader-threads 5 --merge-input-intervals --consolidate
I tried manual access of the data from gcs bucket but got:
Bucket is a requester pays bucket but no user project provided.
Comments
3 comments
Update:
Trying https://portal.firecloud.org/?return=anvil#methods/jcerrato/JointGenotyping_v1.5.1_RPbuckets/2
Error changed to:
Since the gsutil -u method work for transfer I don't understand why this would not work. Is there a way to work this into the RPBucket version of the code?
Hi Jason,
Thanks for writing in! When running a workflow against a workspace bucket with Requester Pays applied to it, Terra will automatically try to pass the Billing Project of the Workspace that the Workflow is run from. My suggestion would be to check if your account has been added to the same Billing Project that the Workspaces uses. We've seen these errors in the past when the account running the workflow isn't part of the same Billing Project.
Please let me know if that information was helpful or if you have any questions.
Best,
Josh
Please sign in to leave a comment.