How to run samtools on gs object directly, to get a BAM slice fast (for example)
I've learned today how to get a slice of a BAM on an instance, without having to download the whole BAM and index first. I imagine that's what the IGV app does, but I couldn't find any documentation here, so I'm adding it.
I wanted to do that on an interactive instance using the default image, so in a notebook or a terminal for example.
By default, I was getting an error like when trying a samtools command on the gs path directly. Something like:
samtools view: failed to open "gs://fc-secure..../reads.bam" for reading: Permission denied
In the end, it was just a matter of setting up an environment value with the GCP token.
## write the GCP token in a file
gcloud auth print-access-token > /home/jupyter/token.txt
## point the HTS ENV variable to that file
export HTS_AUTH_LOCATION="/home/jupyter/token.txt"
## optional: download the BAM index locally (esp. if not in the same location as the BAM)
gsutil cp gs://.../reads.bam.bai .
## run samtools command on the gs object
samtools view -b gs://.../reads.bam REGION > reads.region.bam
I imagine this work for indexed VCFs too.
Comments
1 comment
Does this work with the general htslib api?
Please sign in to leave a comment.