Merging output across multiple WDL runs?

Post author
Cecile Avery

Hello!

Thanks much to the Terra support team for always being so helpful. I have a very general question, and I hope I'm not missing something obvious.

I have a simple WDL that filters a file for variants of interest in a large database. This database is split by chromosome, so I have been running the WDL on each chromosome individually since the variants are scattered across the genome.

In a separate notebook, I merged the outputs together into a single file. It would be preferable if I could do this within the WDL itself, but because each chromosome is being processed individually, each workflow doesn't have access to the other runs... Is that correct?

Is there a better way to accomplish what I am trying to do? I tried running the WDL on a set thinking that this would group them together, but it immediately failed as the "set was empty". 

Thank you,

Cecile

Comments

1 comment

  • Comment author
    Josh Evans

    Hi Cecile,

    Thanks for writing in! I believe that you are right that since each chromosome is being processed by itself that you can't group them together within that WDL.  It might be possible to do this as it's own task but I would be more concerned that the task would try to run for every chromosome.

    What you could do is make a separate WDL that takes all of the output files and merges them together like your notebook file. (You might even be able to keep the same logic.) This might be more helpful than the notebook file as you can just run this WDL after your first workflow and not have to worry about starting a Cloud Environment.

    Please let me know if this information was helpful or if you have any questions.

    Best,

    Josh 

    0

Please sign in to leave a comment.