Multiple unmapped bam files for each sample??

Post author
KM

Hi @Samantha, let me ask a short question.

I am currently attempting the "1-1-Preprocessing-For-Variant-Discovery-HG38" workflow, and I am confused as to why each sample in downsampled_flowcell_unmapped_bams_list has multiple unmapped BAM files.

From my understanding, when we analyze a sample using a sequencer, we will receive paired FASTQ files. If we process these FASTQ files using "Paired-FASTQ-to-Unmapped-BAM", only ONE unmapped BAM file should be generated per sample.

Comments

2 comments

  • Comment author
    Samantha (she/her)

    Hi KM,

    Unfortunately, I cannot speak to the bioinformatics behind these workflows, but I can point you to some documentation that should give you a better idea of what these workflows do and what they expect as inputs.

    Here is an article on the GATK support site detailing the data pre-processing workflow: https://gatk.broadinstitute.org/hc/en-us/articles/360035535912-Data-pre-processing-for-variant-discovery.

    Under "Expected Inputs," it mentions:

    This workflow is designed to operate on individual samples, for which the data is initially organized in distinct subsets called read groups. These correspond to the intersection of libraries (the DNA product extracted from biological samples and prepared for sequencing, which includes fragmenting and tagging with identifying barcodes) and lanes (units of physical separation on the DNA sequencing chips) generated through multiplexing (the process of mixing multiple libraries and sequencing them on multiple lanes, for risk and artifact mitigation purposes).

    Our reference implementations expect the read data to be input in unmapped BAM (uBAM) format. Conversion utilities are available to convert from FASTQ to uBAM.

     

    Also, to address your question about the Paired-FASTQ-to-Unmapped-BAM workflow, it should be producing a set of uBAMs, as mentioned in the WDL script:

     

    Best,

    Samantha

    0
  • Comment author
    KM

    Thank you for explaining it to me so kindly. I realized that I didn't understand what was written.

    Using the make_fofn of Paired-FASTQ-to-Unmapped-BAM creates a list, and its output can be directly used as the input of Preprocessing. I misunderstood that there was a need for a different workflow between Paired-FASTQ-to-Unmapped-BAM and Preprocessing.

    I understand now. Thank you very much.

    0

Please sign in to leave a comment.