Warning of gatk MarkDuplicatesSpark

Post author

I ran GATK MarkDuplicatesSpark on our cluster but got the following warnings. It looks the job finished and  the marked_duplicates.bam output was created. Due to the warnings, I wonder whether there might be concerns for  downstream BQSR analyses using the marked_duplicates.bam output.

Thank you!



Warning 1:

Using GATK jar /home/software/gatk/gatk-
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/software/gatk/gatk- MarkDuplicatesSpark -I SRR5134749_sorted_reads.bam -O SRR5134749_marked_duplicates.bam
03:45:57.003 WARN  SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be set or the GCS hadoop connector will not be configured properly
03:45:57.132 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/software/gatk/gatk-!/com/intel/gkl/native/libgkl_compression.so
May 03, 2022 3:45:58 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
03:45:58.852 INFO  MarkDuplicatesSpark - ------------------------------------------------------------
03:45:58.853 INFO  MarkDuplicatesSpark - The Genome Analysis Toolkit (GATK) v4.1.2.0
03:45:58.853 INFO  MarkDuplicatesSpark - For support and documentation go to https://software.broadinstitute.org/gatk/
03:45:58.853 INFO  MarkDuplicatesSpark - Executing as duan@b16.private on Linux v3.10.0-1127.19.1.el7.x86_64 amd64
03:45:58.854 INFO  MarkDuplicatesSpark - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_262-b10
03:45:58.854 INFO  MarkDuplicatesSpark - Start Date/Time: May 3, 2022 3:45:57 AM EDT

Warning 2:

WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Warning 3:

WARN HadoopFileSystemWrapper: Concat not supported, merging serially





1 comment

  • Comment author
    Samantha (she/her)

    Hi Duan,

    Thanks for writing in. It sounds like you are running a GATK tool on your own cluster and not on Terra, is that correct? If so, a better venue for your question would be the GATK forum since this page is focused on issues related to the Terra platform. GATK support staff or a member of the community will be able to assist you.




Please sign in to leave a comment.