Intermittent StorageExceptions (unknown host www.googleapis.com)

Post author
Bhandsaker

I'm trying to get a new workflow working that access multiple cram files from google cloud storage. The workflow doesn't localize the crams, but does some seeking into them using the indexes.

I'm getting intermittent (but frequent) failures with storage exceptions caused by failure to resolve the host www.googleapis.com. The failure are not deterministic: It will fail on different cram files at different times, although all of the failures occur when trying to create a htsjdk.samtools.seekablestream.SeekablePathStream to use with a SamFileReader.

I can run the same code (against the exact same gs:// urls) outside of GCE and terra (on-prem at Broad) and I never see these exceptions. I have seen previous reports of this symptom on long-running jobs, but my jobs are not terribly long, they fail within about 20 minutes. On-prem, they run to completion within about 15 minutes.

Here is a typical stack trace:

##### ERROR --
##### ERROR stack trace
Error: Exception processing cnp: Error reading BAM file /CCDG_13607/Project_CCDG_13607_B01_GRM_WGS.cram.2019-02-06/Sample_HG02009/analysis/HG02009.final.cram: www.googleapis.com
CNP: CNV_HERV_M2 chr6:31984684-31991052
INFO 17:39:39,289 16-Dec-2019 ProgressMeter - Starting 0.0 11.2 m 1111.6 w 100.0% 11.2 m 0.0 s
java.lang.RuntimeException: Error reading BAM file /CCDG_13607/Project_CCDG_13607_B01_GRM_WGS.cram.2019-02-06/Sample_HG02009/analysis/HG02009.final.cram: www.googleapis.com
at org.broadinstitute.sv.common.ReadCountAlgorithm.getReadCount(ReadCountAlgorithm.java:248)
at org.broadinstitute.sv.common.ReadCountAlgorithm.getReadCount(ReadCountAlgorithm.java:239)
at org.broadinstitute.sv.genotyping.GenotypingDepthModule.computeReadCountsFromBamFiles(GenotypingDepthModule.java:443)
at org.broadinstitute.sv.genotyping.GenotypingDepthModule.computeRefReadCounts(GenotypingDepthModule.java:298)
at org.broadinstitute.sv.genotyping.GenotypingDepthModule.computeRefReadCounts(GenotypingDepthModule.java:266)
at org.broadinstitute.sv.genotyping.GenotypingDepthModule.getReadCounts(GenotypingDepthModule.java:231)
at org.broadinstitute.sv.genotyping.GenotypingDepthModule.getCnpReadCounts(GenotypingDepthModule.java:218)
at org.broadinstitute.sv.genotyping.GenotypingDepthModule.genotypeCnp(GenotypingDepthModule.java:142)
at org.broadinstitute.sv.genotyping.GenotypingDepthModule.genotypeCnp(GenotypingDepthModule.java:61)
at org.broadinstitute.sv.genotyping.GenotypingAlgorithm.genotypeCnpInternal(GenotypingAlgorithm.java:149)
at org.broadinstitute.sv.genotyping.GenotypingAlgorithm.genotypeCnp(GenotypingAlgorithm.java:114)
at org.broadinstitute.sv.genotyping.SVGenotyperWalker.processVCFFile(SVGenotyperWalker.java:274)
at org.broadinstitute.sv.genotyping.SVGenotyperWalker.map(SVGenotyperWalker.java:218)
at org.broadinstitute.sv.genotyping.SVGenotyperWalker.map(SVGenotyperWalker.java:58)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:106)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:145)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:95)
at org.broadinstitute.sv.main.SVGenotyper.main(SVGenotyper.java:21)
Caused by: java.lang.RuntimeException: www.googleapis.com
at org.broadinstitute.sv.dataset.SAMPathLocation.createSamFileReader(SAMPathLocation.java:94)
at org.broadinstitute.sv.dataset.DataSet.openSAMFile(DataSet.java:110)
at org.broadinstitute.sv.dataset.DataSet.openSAMFile(DataSet.java:98)
at org.broadinstitute.sv.common.ReadCountAlgorithm.getReader(ReadCountAlgorithm.java:517)
at org.broadinstitute.sv.common.ReadCountAlgorithm.getSingleReadCount(ReadCountAlgorithm.java:423)
at org.broadinstitute.sv.common.ReadCountAlgorithm.getReadCount(ReadCountAlgorithm.java:246)
... 23 more
Caused by: com.google.cloud.storage.StorageException: www.googleapis.com
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:227)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:438)
at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:240)
at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:237)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:236)
at com.google.cloud.storage.contrib.nio.CloudStorageReadChannel.fetchSize(CloudStorageReadChannel.java:238)
at com.google.cloud.storage.contrib.nio.CloudStorageReadChannel.<init>(CloudStorageReadChannel.java:110)
at com.google.cloud.storage.contrib.nio.CloudStorageReadChannel.create(CloudStorageReadChannel.java:90)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.newReadChannel(CloudStorageFileSystemProvider.java:347)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.newByteChannel(CloudStorageFileSystemProvider.java:304)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at htsjdk.samtools.seekablestream.SeekablePathStream.<init>(SeekablePathStream.java:39)
at htsjdk.samtools.seekablestream.SeekablePathStream.<init>(SeekablePathStream.java:33)
at org.broadinstitute.sv.dataset.SAMPathLocation.createSamFileReader(SAMPathLocation.java:88)
... 28 more
Caused by: java.net.UnknownHostException: www.googleapis.com
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:666)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:167)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:143)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:79)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:995)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:499)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:435)
... 44 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7.GS-r1941-0-gb493839):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Error reading BAM file /CCDG_13607/Project_CCDG_13607_B01_GRM_WGS.cram.2019-02-06/Sample_HG02009/analysis/HG02009.final.cram: www.googleapis.com
##### ERROR ------------------------------------------------------------------------------------------

Comments

2 comments

  • Comment author
    Tiffany Miller

    This has been resolved via another ticket. 

    0
  • Comment author
    Bhandsaker

    It seems worth recording here the cause of this issue. After quite a bit of debugging and experimentation, we determined that the root cause was that the compute instance was experiencing memory pressure, which presumably was causing the DNS lookups to begin failing. One hypothesis, not confirmed to my knowledge, is that there is a process running on the compute instance that handles DNS lookups and if the compute instance is low on virtual memory this process can die causing all future DNS lookups to fail.

    The solution in this case was to increase the memory allocated to the compute instance, which eliminated the problem. We found that if we increased or decreased the allocated memory slightly, we could control the rate of failures due to these storage exceptions (less memory -> more memory pressure -> higher rate of DNS lookups leading to storage exceptions).

    It is not at all obvious from the symptom (StorageException caused by no-such-host) that the root cause is lack of memory on the compute instance.

     

    0

Please sign in to leave a comment.