can you recommend a great tutorial on "processing-for-variant-discovery-gatk4" with different read groups/lanes on terra?
Please can someone recommend any walk through/ great tutorial on "processing-for-variant-discovery-gatk4" with multiple read groups on terra? I have seen the workflow in terra but I'm not sure I understand what the steps are and the required inputs
I have tumor/Normal samples with aims of doing variant calling. but my starting materials (ubams) are for example;
- PID_1_tumorSample_lane1.unmapped.bam
- PID_1_tumorSample_lane2.unmapped.bam
- PID_1_tumorSample_lane3.unmapped.bam
- PID_2_tumorSample_lane1.unmapped.bam
- PID_2_tumorSample_lane2.unmapped.bam
- PID_1_GermlineSample_lane1.unmapped.bam
- PID_1_GermlineSample_lane2.unmapped.bam
- PID_2_GermlineSample_lane1.unmapped.bam
- PID_2_GermlineSample_lane2.unmapped.bam
Again, there are recommendations by GATK but don't know I could implement this in terra
https://software.broadinstitute.org/gatk/documentation/article.php?id=3060
Any suggestions, help and recommendations will be gladly welcome
Thanks
Sam
Comments
1 comment
The requirements in the workflow WDL header states
## Requirements/expectations :
## - Pair-end sequencing data in unmapped BAM (uBAM) format
## - One or more read groups, one per uBAM file, all belonging to a single sample (SM)
## - Input uBAM files must additionally comply with the following requirements:
## - - filenames all have the same suffix (we use ".unmapped.bam")
## - - files must pass validation by ValidateSamFile
## - - reads are provided in query-sorted order
## - - all reads must have an RG tag
You should be able to provide the workflow any ubam from different readgroups as long as it's the same sample. So in your case you would run the workflow twice, once providing all ubams associated with the tumor sample and a second time for all the ubams associated with the normal sample
Please sign in to leave a comment.