is there a way to use a subset of array as input?

Post author
Yu Qiu

This is a general WDL question: Is there a way to use a subset of array (generated by a scatter job) as input for the next task?

Example 1: Somatic variant calling from a set of tumor/normal samples.  In some cases, a normal sample could be used for several tumor samples.  So I want to scatter a subworkflow to generate bam files as an Array[File].  Then based on tumor/normal pair info, I want to subset the Array and use it as input for Mutect2 workflow. 

Example 2: A RNAseq experiment contain samples in several groups: for example, 9 samples in 3 groups: A1, A2, A3, B1, B2, B3, C1, C2, C3.  The first subworkflow would be scattered to do alignment and feature count.  The outcome would be a Arrray[File] containing all feature count files. Then I want to do pairwise differential analysis, so I need to use a subset of the Array as the input for the next subworkflow.   Since the number of groups and samples could be vary from time to time, I couldn't hard code the index. 

 

 

Comments

1 comment

  • Comment author
    Jason Cerrato
    • Edited

    Hi Yu,

    We're not currently aware of a way to use subsets of arrays in the way you describe. Would it work for your purposes to employ a nested array? Example: [[A1, A2, A3], [B1, B2, B3], [C1, C2, C3]]

    Depending on your needs, the employing of pairs, maps, or object literals may be to your benefit. See the full WDL spec here: https://github.com/openwdl/wdl/blob/master/versions/1.0/SPEC.md

    If you have any questions, please let us know.

    Kind regards,

    Jason

    0

Please sign in to leave a comment.