Dear GATK Team and users:
I am using the best practices (126.96.36.199) to perform germline snp calling on chicken genomes, and the genome data were either from NCBI or collected and sequenced in my study. Everything ran smoothly, but when I check the vcf files, I found that some samples had a single GT (0 or 1) rather than it normally should be (0/0 or 1/1 etc.). Here is an example (there are too many individuals so I just kept 3 with 1 abnormal GT individual in the last).
10 530316 . G A 7884.96 PASS AC=1;AF=0.632;AN=1;BaseQRankSum=0.687;DP=2709;ExcessHet=0.8963;FS=5.799;InbreedingCoeff=0.2301;MLEAC=26;MLEAF=0.684;MQ=59.67;MQRankSum=0.00;QD=27.47;ReadPosRankSum=-1.350e-01;SOR=0.911 GT:AD:DP:GQ:PGT:PID:PL:PS 1/1:0,14:14:45:.:.:522,45,0 1|1:0,13:13:42:1|1:19583_G_A:599,42,0:19583 0/0:21,0:21:60:.:.:0,60,528 1:3,5:8:99:1|1:530316_G_A:170,0:530316
The interesting thing is that only the data I downloaded showed this pattern. When I checked for the numbers of such position, it took around 10% of the total snp sites for each of these individuals.
I then compared their bam files using IGV (the bam file right before stuffing them into HaplotypeCaller), and they look fine ti me. And apart of the fact that the data we collected and sequenced had slightly higher coverage depths, I generally saw no obvious patterns on those sites.
Here is an example. The three snps on the top window were called with only one genotype where the bottom window was called normally.
I really appreciate your help.
Please sign in to leave a comment.