How to run samtools on gs object directly, to get a BAM slice fast (for example)

Post author
Jean Monlong

I've learned today how to get a slice of a BAM on an instance, without having to download the whole BAM and index first. I imagine that's what the IGV app does, but I couldn't find any documentation here, so I'm adding it.

I wanted to do that on an interactive instance using the default image, so in a notebook or a terminal for example.

By default, I was getting an error like when trying a samtools command on the gs path directly. Something like:

samtools view: failed to open "gs://fc-secure..../reads.bam" for reading: Permission denied

In the end, it was just a matter of setting up an environment value with the GCP token.

## write the GCP token in a file
gcloud auth print-access-token > /home/jupyter/token.txt
## point the HTS ENV variable to that file
export HTS_AUTH_LOCATION="/home/jupyter/token.txt"
## optional: download the BAM index locally (esp. if not in the same location as the BAM)
gsutil cp gs://.../reads.bam.bai .
## run samtools command on the gs object
samtools view -b gs://.../reads.bam REGION > reads.region.bam

I imagine this work for indexed VCFs too.

Comments

1 comment

  • Comment author
    Mark Chaisson

    Does this work with the general htslib api?

    0

Please sign in to leave a comment.