We're creating a workflow to demultiplex Illumina bcl's to uBam using Picard. The downside here is that Picard creates files per lane.
I'm trying to merge the uBam's/sample/lane into 1 uBam/sample, based on the the sample name in de filenames.
Currently I have a flattened array of alle uBam files for all samples and lanes (subject to change if needed), but I'd need to do to something like this.
flattened_array = [file1_L1.ubam, file2_L2.ubam, fileX_L1.ubam, fileX_L2.ubam
"files": [file1_L1.ubam, file2_L2.ubam, fileX_L1.ubam, fileX_L2.ubam
"files": [fileX_L1.ubam, fileX_L2.ubam
so I can merge the files per lane into 1 file to feed into the mapping/variant calling workflow.
Any tips or tricks? I've kind of fixed it with a custom python script, but that doesn't scatter to one merge task per sample, which would be the ideal case.