Joint Genotyping on Terra issue with accessing external buckets for importGVCFs task in GATK. [error code 400]

Post author
Jason Ni

Hi! I'm trying to use a small sample (~30) in the data collection 1000Genome HIgh coverage to pad my joint genotyping for a small cohort. I have imported the gs path. But I got the following error:

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.dae86bf1
05:15:35.461 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
05:15:35.872 INFO  GenomicsDBImport - ------------------------------------------------------------
05:15:35.873 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.8.0
05:15:35.873 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
05:15:35.873 INFO  GenomicsDBImport - Executing as root@7f7d5677e221 on Linux v5.15.65+ amd64
05:15:35.873 INFO  GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
05:15:35.874 INFO  GenomicsDBImport - Start Date/Time: October 11, 2022 5:15:35 AM GMT
05:15:35.874 INFO  GenomicsDBImport - ------------------------------------------------------------
05:15:35.874 INFO  GenomicsDBImport - ------------------------------------------------------------
05:15:35.875 INFO  GenomicsDBImport - HTSJDK Version: 2.22.0
05:15:35.875 INFO  GenomicsDBImport - Picard Version: 2.22.8
05:15:35.875 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
05:15:35.875 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
05:15:35.875 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
05:15:35.876 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
05:15:35.876 INFO  GenomicsDBImport - Deflater: IntelDeflater
05:15:35.876 INFO  GenomicsDBImport - Inflater: IntelInflater
05:15:35.876 INFO  GenomicsDBImport - GCS max retries/reopens: 20
05:15:35.876 INFO  GenomicsDBImport - Requester pays: disabled
05:15:35.876 INFO  GenomicsDBImport - Initializing engine
05:15:37.561 INFO  FeatureManager - Using codec IntervalListCodec to read file file:///cromwell_root/fc-secure-4329a05b-7ebf-4931-86a6-1e3b2a0a9f0e/submissions/bae11f4f-a837-4299-9454-9b3ae6972b67/JointGenotyping/3154593f-8309-4f4c-aa1f-dacf92ce5479/call-SplitIntervalList/glob-d928cd0f5fb17b6bd5e635f48c18ccfb/0000-scattered.interval_list
05:15:37.788 INFO  IntervalArgumentCollection - Processing 385747762 bp from intervals
05:15:37.799 INFO  GenomicsDBImport - Done initializing engine
05:15:38.250 INFO  GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905
05:15:38.251 INFO  GenomicsDBImport - Vid Map JSON file will be written to /cromwell_root/genomicsdb/vidmap.json
05:15:38.251 INFO  GenomicsDBImport - Callset Map JSON file will be written to /cromwell_root/genomicsdb/callset.json
05:15:38.251 INFO  GenomicsDBImport - Complete VCF Header will be written to /cromwell_root/genomicsdb/vcfheader.vcf
05:15:38.251 INFO  GenomicsDBImport - Importing to workspace - /cromwell_root/genomicsdb
05:15:38.251 WARN  GenomicsDBImport - GenomicsDBImport cannot use multiple VCF reader threads for initialization when the number of intervals is greater than 1. Falling back to serial VCF reader initialization.
05:15:38.251 INFO  ProgressMeter - Starting traversal
05:15:38.252 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
05:15:51.646 INFO  GenomicsDBImport - Shutting down engine
[October 11, 2022 5:15:51 AM GMT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.27 minutes.
Runtime.totalMemory()=8232370176
code:      400
message:   Bucket is a requester pays bucket but no user project provided.
reason:    required
location:  null
retryable: false
com.google.cloud.storage.StorageException: Bucket is a requester pays bucket but no user project provided.
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:229)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:439)
	at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:242)
	at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:239)
	at shaded.cloud_nio.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
	at shaded.cloud_nio.com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at shaded.cloud_nio.com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:238)
	at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.checkAccess(CloudStorageFileSystemProvider.java:736)
	at java.nio.file.Files.exists(Files.java:2385)
	at htsjdk.tribble.util.ParsingUtils.resourceExists(ParsingUtils.java:418)
	at htsjdk.tribble.AbstractFeatureReader.isTabix(AbstractFeatureReader.java:230)
	at htsjdk.tribble.AbstractFeatureReader$ComponentMethods.isTabix(AbstractFeatureReader.java:236)
	at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:114)
	at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:833)
	at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getFeatureReadersSerially(GenomicsDBImport.java:817)
	at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.createSampleToReaderMap(GenomicsDBImport.java:659)
	at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:699)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: shaded.cloud_nio.com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Bucket is a requester pays bucket but no user project provided.",
    "reason" : "required"
  } ],
  "message" : "Bucket is a requester pays bucket but no user project provided."
}
	at shaded.cloud_nio.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:451)
	at shaded.cloud_nio.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1089)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:549)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:482)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:599)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:436)
	... 20 more
Using GATK jar /gatk/gatk-package-4.1.8.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms8g -jar /gatk/gatk-package-4.1.8.0-local.jar GenomicsDBImport --genomicsdb-workspace-path genomicsdb --batch-size 50 -L /cromwell_root/fc-secure-4329a05b-7ebf-4931-86a6-1e3b2a0a9f0e/submissions/bae11f4f-a837-4299-9454-9b3ae6972b67/JointGenotyping/3154593f-8309-4f4c-aa1f-dacf92ce5479/call-SplitIntervalList/glob-d928cd0f5fb17b6bd5e635f48c18ccfb/0000-scattered.interval_list --sample-name-map /cromwell_root/fc-secure-4329a05b-7ebf-4931-86a6-1e3b2a0a9f0e/submissions/17a51933-9a4b-4405-b9f3-e8fcf4cc256c/GenerateSampleMap/f77d25e7-d0cd-4a9c-97e2-733512f8b33f/call-GenerateSampleMapFile/generate-sample-map_2022-10-11T04-48-34.sample_map --reader-threads 5 --merge-input-intervals --consolidate

I tried manual access of the data from gcs bucket but got: 

Bucket is a requester pays bucket but no user project provided.

 

Comments

3 comments

  • Comment author
    Jason Ni

    Update:

    Trying https://portal.firecloud.org/?return=anvil#methods/jcerrato/JointGenotyping_v1.5.1_RPbuckets/2

     

    0
  • Comment author
    Jason Ni

    Error changed to: 

    code:      400
    message:   Bucket is a requester pays bucket but no user project provided.
    reason:    required
    location:  null
    retryable: false

    Since the gsutil -u method work for transfer I don't understand why this would not work. Is there a way to work this into the RPBucket version of the code?

    0
  • Comment author
    Josh Evans

    Hi Jason,

    Thanks for writing in! When running a workflow against a workspace bucket with Requester Pays applied to it, Terra will automatically try to pass the Billing Project of the Workspace that the Workflow is run from. My suggestion would be to check if your account has been added to the same Billing Project that the Workspaces uses.  We've seen these errors in the past when the account running the workflow isn't part of the same Billing Project.

    Please let me know if that information was helpful or if you have any questions.

    Best,

    Josh

    0

Please sign in to leave a comment.