More Control Over Files Being Downloaded to Local Space by Workflow
I am in the process of writing a Terra workflow that will merge thousands of VCF files. The problem is that if I provide an Array[File] as input to the workflow containing paths to the VCF files, all of them will be downloaded to local disk and use a huge amount of space. Instead, I want to have control over how they're downloaded to local disk (download and merge smaller sets of files at a time, as opposed to merging them all at once with a single command). Is there a way of controlling how the workflow downloads files to local disk?
Thanks for reaching out. To control how your workflow downloads files to the disk associated with your workflow VM you will need to modify the WDL of this workflow.
One possible option: Instead of providing your VCF files as input as an Array[File], you could provide a File input that contains a list of the file paths you are working with, and then
gsutil cpbatches of them as desired to the disk.
Does this sound like it would be a viable workaround for you? If you have any other questions about this please let us know.
Please sign in to leave a comment.