FASTQ file format for Paired-FASTQ-to-unmapped-BAM pipeline

Post author
Vithal Madhira

Hello,

I am using zipped FASTQ files as inputs for this pipeline and getting an error. Here is the exception.

htsjdk.samtools.SAMException: Sequence header must start with @: ?�����???v???e??_??7??k?<?uqsdf+??[???�?[?????D?0???2CA?<????m???[?^t????9?????????v????\?W?k????^?^??????|??????????|?G??h?DB?????$?????h??w2(z??!6??#? "J????\???Y\?s????????x:?o??N?r??????Z??????????I~J%???M1?e/????3`?????:?,ZT?????<??w???RF?

Just want to check if the pipleline can process file in *.fastq.gz.1 , *.fastq.gz.2
format.

Do I need to use additional INPUT params to handle gzip files?

Regards,
Vithal

Comments

5 comments

  • Comment author
    Jason Cerrato

    Hi Vithal,

    Happy to see if I can help here. Can you point me to the pipeline you are referring to? I do not see it listed as a featured workspace or workflow.

    Kind regards,

    Jason

    0
  • Comment author
    Vithal Madhira

    Hi Jason,

    The workspace name is: Sequence-Format-Conversion

    The tools you need to convert various sequencing file formats to GATK analysis ready input formats. Plus a validation tool to confirm that SAM or BAM files are in the proper format.
    1) Interleaved FASTQ to paired FASTQ
    2) Paired FASTQ to unmapped BAM
    3) BAM to unmapped BAM
    4) CRAM to BAM files from sequencer output for use in GATK analysis tools.

    The Validate BAM tool is also added to confirm proper formatting of SAM or BAM files.

     

    I am trying #2 Paired FASTQ to unmapped BAM by passing FASTQ files in gzip format.

     

    Thanks,

    Vithal

    0
  • Comment author
    Jason Cerrato
    • Edited

    Hi Vithal,

    If you look further down the page for the workspace description, you can see this paragraph:

    For more details on input types typically used by GATK please review the following article: What Input Files does the GATK Accept/Require?

    If you follow this link, it brings you to a post that describes what types of files are accepted.

     

    Based on this information, it appears that gzipped fasta files will not work, as you are currently experiencing. I recommend following that link for more information on preparing FASTA reference sequences for use with the GATK.

    Kind regards,

    Jason

    0
  • Comment author
    Vithal Madhira

    Thank you Jason. I will read the documentation and will make necessary changes. It will be nice if the tool allows zipped versions. I hope future versions will allow. 

    0
  • Comment author
    Jason Cerrato

    Hi Vithal,

    If you are interested, you can make a feature request for GATK here: https://github.com/broadinstitute/gatk/issues

    Kind regards,

    Jason

    0

Please sign in to leave a comment.