Joint Genotyping on Terra issue with accessing external buckets for importGVCFs task in GATK. [error code 400]
Hi! I'm trying to use a small sample (~30) in the data collection 1000Genome HIgh coverage to pad my joint genotyping for a small cohort. I have imported the gs path. But I got the following error:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.dae86bf1
05:15:35.461 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
05:15:35.872 INFO GenomicsDBImport - ------------------------------------------------------------
05:15:35.873 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.8.0
05:15:35.873 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
05:15:35.873 INFO GenomicsDBImport - Executing as root@7f7d5677e221 on Linux v5.15.65+ amd64
05:15:35.873 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
05:15:35.874 INFO GenomicsDBImport - Start Date/Time: October 11, 2022 5:15:35 AM GMT
05:15:35.874 INFO GenomicsDBImport - ------------------------------------------------------------
05:15:35.874 INFO GenomicsDBImport - ------------------------------------------------------------
05:15:35.875 INFO GenomicsDBImport - HTSJDK Version: 2.22.0
05:15:35.875 INFO GenomicsDBImport - Picard Version: 2.22.8
05:15:35.875 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
05:15:35.875 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
05:15:35.875 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
05:15:35.876 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
05:15:35.876 INFO GenomicsDBImport - Deflater: IntelDeflater
05:15:35.876 INFO GenomicsDBImport - Inflater: IntelInflater
05:15:35.876 INFO GenomicsDBImport - GCS max retries/reopens: 20
05:15:35.876 INFO GenomicsDBImport - Requester pays: disabled
05:15:35.876 INFO GenomicsDBImport - Initializing engine
05:15:37.561 INFO FeatureManager - Using codec IntervalListCodec to read file file:///cromwell_root/fc-secure-4329a05b-7ebf-4931-86a6-1e3b2a0a9f0e/submissions/bae11f4f-a837-4299-9454-9b3ae6972b67/JointGenotyping/3154593f-8309-4f4c-aa1f-dacf92ce5479/call-SplitIntervalList/glob-d928cd0f5fb17b6bd5e635f48c18ccfb/0000-scattered.interval_list
05:15:37.788 INFO IntervalArgumentCollection - Processing 385747762 bp from intervals
05:15:37.799 INFO GenomicsDBImport - Done initializing engine
05:15:38.250 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905
05:15:38.251 INFO GenomicsDBImport - Vid Map JSON file will be written to /cromwell_root/genomicsdb/vidmap.json
05:15:38.251 INFO GenomicsDBImport - Callset Map JSON file will be written to /cromwell_root/genomicsdb/callset.json
05:15:38.251 INFO GenomicsDBImport - Complete VCF Header will be written to /cromwell_root/genomicsdb/vcfheader.vcf
05:15:38.251 INFO GenomicsDBImport - Importing to workspace - /cromwell_root/genomicsdb
05:15:38.251 WARN GenomicsDBImport - GenomicsDBImport cannot use multiple VCF reader threads for initialization when the number of intervals is greater than 1. Falling back to serial VCF reader initialization.
05:15:38.251 INFO ProgressMeter - Starting traversal
05:15:38.252 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
05:15:51.646 INFO GenomicsDBImport - Shutting down engine
[October 11, 2022 5:15:51 AM GMT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.27 minutes.
Runtime.totalMemory()=8232370176
code: 400
message: Bucket is a requester pays bucket but no user project provided.
reason: required
location: null
retryable: false
com.google.cloud.storage.StorageException: Bucket is a requester pays bucket but no user project provided.
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:229)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:439)
at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:242)
at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:239)
at shaded.cloud_nio.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
at shaded.cloud_nio.com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at shaded.cloud_nio.com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:238)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.checkAccess(CloudStorageFileSystemProvider.java:736)
at java.nio.file.Files.exists(Files.java:2385)
at htsjdk.tribble.util.ParsingUtils.resourceExists(ParsingUtils.java:418)
at htsjdk.tribble.AbstractFeatureReader.isTabix(AbstractFeatureReader.java:230)
at htsjdk.tribble.AbstractFeatureReader$ComponentMethods.isTabix(AbstractFeatureReader.java:236)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:114)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:833)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getFeatureReadersSerially(GenomicsDBImport.java:817)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.createSampleToReaderMap(GenomicsDBImport.java:659)
at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:699)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: shaded.cloud_nio.com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Bucket is a requester pays bucket but no user project provided.",
"reason" : "required"
} ],
"message" : "Bucket is a requester pays bucket but no user project provided."
}
at shaded.cloud_nio.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)
at shaded.cloud_nio.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at shaded.cloud_nio.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:451)
at shaded.cloud_nio.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1089)
at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:549)
at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:482)
at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:599)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:436)
... 20 more
Using GATK jar /gatk/gatk-package-4.1.8.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms8g -jar /gatk/gatk-package-4.1.8.0-local.jar GenomicsDBImport --genomicsdb-workspace-path genomicsdb --batch-size 50 -L /cromwell_root/fc-secure-4329a05b-7ebf-4931-86a6-1e3b2a0a9f0e/submissions/bae11f4f-a837-4299-9454-9b3ae6972b67/JointGenotyping/3154593f-8309-4f4c-aa1f-dacf92ce5479/call-SplitIntervalList/glob-d928cd0f5fb17b6bd5e635f48c18ccfb/0000-scattered.interval_list --sample-name-map /cromwell_root/fc-secure-4329a05b-7ebf-4931-86a6-1e3b2a0a9f0e/submissions/17a51933-9a4b-4405-b9f3-e8fcf4cc256c/GenerateSampleMap/f77d25e7-d0cd-4a9c-97e2-733512f8b33f/call-GenerateSampleMapFile/generate-sample-map_2022-10-11T04-48-34.sample_map --reader-threads 5 --merge-input-intervals --consolidate
I tried manual access of the data from gcs bucket but got:
Bucket is a requester pays bucket but no user project provided.
Comments
3 comments
Update:
Trying https://portal.firecloud.org/?return=anvil#methods/jcerrato/JointGenotyping_v1.5.1_RPbuckets/2
Error changed to:
Since the gsutil -u method work for transfer I don't understand why this would not work. Is there a way to work this into the RPBucket version of the code?
Hi Jason,
Thanks for writing in! When running a workflow against a workspace bucket with Requester Pays applied to it, Terra will automatically try to pass the Billing Project of the Workspace that the Workflow is run from. My suggestion would be to check if your account has been added to the same Billing Project that the Workspaces uses. We've seen these errors in the past when the account running the workflow isn't part of the same Billing Project.
Please let me know if that information was helpful or if you have any questions.
Best,
Josh
Please sign in to leave a comment.