How to index a CRAM file on Terra with samtools?

Post author
Ash O'Farrell

I'm the UCSC guy making a featured workspace involving the TOPMed aligner prior to go-live. I'm testing it by having it run on the 1000 Genomes data available on Terra. That data has CRAM files but no CRAI files (needed by the aligner) so I need to generate those CRAI files with samtools or another tool before I can run the aligner.

If I put the CRAM files in my workspace bucket, samtools cannot find it.

!samtools index gs://[path to cram]
[E::hts_open_format] fail to open file 'gs://[path to cram]'
samtools index: failed to open "gs://[path to cram]": Protocol not supported

I'm not sure if it's because Terra's verison of samtools doesn't contain htslib or if the notebook bucket can't "see" the workspace bucket. So I tried putting it into the notebook bucket using gsutil cp and got the following error output.

==> NOTE: You are downloading one or more large file(s), which would run significantly faster if you enabled sliced object downloads. This feature is enabled by default but requires that compiled crcmod be installed (see "gsutil help crcmod").

CommandException: Downloading this composite object requires integrity checking with CRC32c, but your crcmod installation isn't using the module's C extension, so the hash computation will likely throttle download performance. For help installing the extension, please see "gsutil help crcmod". To download regardless of crcmod performance or to skip slow integrity checks, see the "check_hashes" option in your boto config file.

I can't install crcmod without root permissions. Does anyone have any ideas? If it doesn't involve samtools, that's fine, I just need some way of indexing these CRAM files. I've considered using a custom docker container with a installation of samtools that I know contains htslib, but that won't help if the real issue is that I can't transfer the CRAM into the notebook bucket.

The workspace name is "TOPMed Alignment and Freeze8 Variant Calling" and has a Jupyter notebook with my process.

 

Comments

10 comments

  • Comment author
    Jason Cerrato

    Hi Aisling,

    We'll take a look at this and get back to you with any questions or suggestions as soon as we can.

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hi Aisling,

    Which notebook runtime are you using? Are you using the default image? 

    Kind regards,

    Jason

    0
  • Comment author
    Ash O'Farrell

    Yes, it's the default image.

    0
  • Comment author
    Jason Cerrato

    Hi Aisling,

    Just wanted to drop in and let you know we're taking a closer look at the default images to make sure they'll work for your purposes. I'm also planning to take a look to see if there are any readily-available WDLs that can alternatively be used for your indexing. Are you currently blocked until either one of these options are presented? I will also converse with the developers to see if going the custom docker container route would work.

    Kind regards,

    Jason

    0
  • Comment author
    Ash O'Farrell

    I am in contact with Walt from UCSC, meaning I can edit the WDL from my end. I have done so and I'm currently in the process of testing it.

    So I'm not blocked at the moment, as I've found an alternative route.

    0
  • Comment author
    Jason Cerrato

    Hi Aisling,

    Glad to hear. Would you still like for me to reach out regarding the default images once we've confirmed they would work for this purpose?

    Kind regards,

    Jason

    0
  • Comment author
    Ash O'Farrell

    Yes, that'd be brilliant. Always good to have more information. Thanks!

    0
  • Comment author
    Stephanie Gogarten

    Is there now a readily available WDL to run `samtools index`? I tried searching for one on dockstore, but got >200 results for WDLs that contain indexing somewhere as a step. I also tried searching the Broad Methods Repository, and managed to import this: https://portal.firecloud.org/?return=anvil#methods/bknisbac/samtools_index/6
    But it can only name its output ${sample_id}.bam.bai, and I have a CRAM file, not a BAM file. I could always rename the file afterwards, but I'm hoping there's a more straightforward option.

    0
  • Comment author
    Ash O'Farrell
    • Edited

    Hello Stephanie! Originally this question was about Jupyter instead of WDL, but I think I can help out here. I just whipped up a simple samtools index WDL workflow for you and put it on Dockstore. It should autodetect if you're putting in bams or crams, and adjust the extension accordingly. Let me know if you have any issues with it -- you helped me a lot during the UWGAC CWL-->WDL project so let me help you back.

    https://dockstore.org/workflows/github.com/aofarrel/samtools-index-WDL/samtools_index:main?tab=info

    0
  • Comment author
    Stephanie Gogarten
    • Edited

    Thanks Ash! I feel like the bigger issue here is a failure of Dockstore search, because even knowing that the exact workflow I want exists, it's not straightforward to find it. I posted about this on the Dockstore forum.

    0

Please sign in to leave a comment.