Adding my own reference disk image to be mounted to the workflow task VMs

Post author
Beri

Currently, Terra allows users to mount a reference disk image when running a workflow, allowing a task to mount the disk with reference files instead of localizing it to the running VM. 

Problem: 
The reference disk image is hardset so you can only use the reference files (hg19 and hg38) that are already available on the disk. It's not possible to use your own reference files.

Feature Request:
Allow users to specify their own reference disks.

Comments

6 comments

  • Comment author
    WillyN

    Curious if there's been any work on this front? It would be extremely useful to attach disks to my workflows. It gets fairly expensive to repeatedly localize large inputs. 

    1
  • Comment author
    Josh Evans

    Hi Willy,

    Thanks for writing in! I'm going to see where we are with this feature request, and let you know once I have an update.

    Best,

    Josh

    0
  • Comment author
    Josh Evans

    Hi again Willy,

    I did some research and as it turns out, this feature request is on our roadmap, but we don't have a current date for its implementation. I've added your name to ticket, and if it's implemented, this space will be updated.

    Please let me know if you have any questions.

    Best,

    Josh

    0
  • Comment author
    Devin McCabe

    Any update on this request? I don't see it on the new public roadmap unfortunately.

    0
  • Comment author
    Beth Sheets

    Hi all, we don't currently have this on our roadmap. Is there a community resource that we could support as an official reference disk that would help your use case?

    0
  • Comment author
    Devin McCabe

    I suspect that beyond the standard hg19 and hg38 reference files, the next most common data set used across all of Terra might not be all that common. Murine reference genomes and newer human reference genomes like GIAB/T2T might be next in line, but who knows.

    Examples of large data sets our group repeatedly localizes include a cache for VEP and a STAR index (each of these is about 25 GB), but I wouldn't think that, e.g., our specific version of VEP would be universal enough to be included in gcp-public-data--broad-references.

    0

Please sign in to leave a comment.