I am a Bioinformatics Scientist that is interested in collaborating with other developers to create an accelerated (scatter-gather, multithreaded, or using other experimental methods) bioinformatics pipeline for large tranches of data. I would like this pipeline to be available to all Terra.bio users and focused on taking large data tranches through: GATK - Bcftools - VEP with multiple options available to users for each tool, with the flexibility to add extra steps for bcftools before the data proceeds to VEP.
Rationale: Currently, GATK only offers multithreading for their Pair-HMM algorithm within HaplotypeCaller. Bcftools offers threading but will not be able to process large scale data tranches quickly unless it is also used with scatter-gather or other methods. VEP offers the forking ability but is still very slow when used with large data tranches and all annotation options. Because of all these limitations some users are forced to subset their data, skip the conversion of multiallelic to biallelic variants, and lose critical information in their analyses.
There are many research institutes that would find this pipeline very useful. Therefore, I am committed to starting this project and seeing it through to its launch.
I am also open to collaborating directly with Terra.bio staff to make this a successful addition to their road map.
I can be reached at firstname.lastname@example.org
Please sign in to leave a comment.