Multiple unmapped bam files for each sample??
Hi @Samantha, let me ask a short question.
I am currently attempting the "1-1-Preprocessing-For-Variant-Discovery-HG38" workflow, and I am confused as to why each sample in downsampled_flowcell_unmapped_bams_list has multiple unmapped BAM files.
From my understanding, when we analyze a sample using a sequencer, we will receive paired FASTQ files. If we process these FASTQ files using "Paired-FASTQ-to-Unmapped-BAM", only ONE unmapped BAM file should be generated per sample.
Unfortunately, I cannot speak to the bioinformatics behind these workflows, but I can point you to some documentation that should give you a better idea of what these workflows do and what they expect as inputs.
Here is an article on the GATK support site detailing the data pre-processing workflow: https://gatk.broadinstitute.org/hc/en-us/articles/360035535912-Data-pre-processing-for-variant-discovery.
Under "Expected Inputs," it mentions:
Also, to address your question about the Paired-FASTQ-to-Unmapped-BAM workflow, it should be producing a set of uBAMs, as mentioned in the WDL script:
Thank you for explaining it to me so kindly. I realized that I didn't understand what was written.
Using the make_fofn of Paired-FASTQ-to-Unmapped-BAM creates a list, and its output can be directly used as the input of Preprocessing. I misunderstood that there was a need for a different workflow between Paired-FASTQ-to-Unmapped-BAM and Preprocessing.
I understand now. Thank you very much.
Please sign in to leave a comment.