can you recommend a great tutorial on "processing-for-variant-discovery-gatk4" with different read groups/lanes on terra?

Post author
Samuel Terkper Ahuno

Please can someone recommend any walk through/ great tutorial on "processing-for-variant-discovery-gatk4" with multiple read groups on terra? I have seen the workflow in terra but I'm not sure I understand what the steps are and the required inputs

I have tumor/Normal  samples with aims of doing variant calling. but my starting materials (ubams) are for example;

  • PID_1_tumorSample_lane1.unmapped.bam
  • PID_1_tumorSample_lane2.unmapped.bam
  • PID_1_tumorSample_lane3.unmapped.bam
  • PID_2_tumorSample_lane1.unmapped.bam
  • PID_2_tumorSample_lane2.unmapped.bam
  • PID_1_GermlineSample_lane1.unmapped.bam
  • PID_1_GermlineSample_lane2.unmapped.bam
  • PID_2_GermlineSample_lane1.unmapped.bam
  • PID_2_GermlineSample_lane2.unmapped.bam


Again, there are recommendations by GATK but don't know I could implement this in terra


Any suggestions, help and recommendations will be gladly welcome







1 comment

  • Comment author
    • Edited

    The requirements in the workflow WDL header states 

    ## Requirements/expectations :
    ## - Pair-end sequencing data in unmapped BAM (uBAM) format
    ## - One or more read groups, one per uBAM file, all belonging to a single sample (SM)
    ## - Input uBAM files must additionally comply with the following requirements:
    ## - - filenames all have the same suffix (we use ".unmapped.bam")
    ## - - files must pass validation by ValidateSamFile
    ## - - reads are provided in query-sorted order
    ## - - all reads must have an RG tag


    You should be able to provide the workflow any ubam from different readgroups as long as it's the same sample. So in your case you would run the workflow twice, once providing all ubams associated with the tumor sample and a second time for all the ubams associated with the normal sample


Please sign in to leave a comment.