Need Help?

Search our documentation and community forum

Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.
Terra powers important scientific projects like FireCloud, AnVIL, and BioData Catalyst. Learn more.

map Array[File] to Object[samplename, Array[Files] based on filename

Comments

7 comments

  • Avatar
    Jason Cerrato

    Hi Matthias,

    Thanks for writing in. We'll take a closer look at this request and get back to you as soon as we can!

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Hi Matthias, 

    There are two workflows for fastqtoUbam in this featured workspace that I *think demonstrate a way of doing what you want in Terra.

    The Method repo has examples of other people's workflows for bcl conversion, but mostly  to fastq , if that is helpful?

    Thanks!

     

     

    0
    Comment actions Permalink
  • Avatar
    Matthias De Smet

    Hi Tiffany,

     

    Thanks for the examples, but those don't really fix my question.

    I'm starting from an array of files which contain all uBam's for all lanes of a flowcell for alle samples.

    I'm trying to figure out how to construct an object or an array of mapped pairs that contains all files/ sample + a sample name derived from the filename of said file.

    I have a script that does just that, but I'm struggling to get to a valid output definition, which I can then use later in the workflow.

    Hope this clarifies my question.

     

    Thanks again

    Matthias

     

    0
    Comment actions Permalink
  • Avatar
    Beri Shifaw

    I don't think there would be an easy way in WDL to reorganize the flattened array into a object[String, Array[Files]]. If there is, the best place to find it is in the WDL specification

    I'm not certain this would work but maybe have a task that organizes the file paths related to a sample written into files (also known as file of file names (fofn)). This would look something like:  

    file1.txt
    file2.txt
    file3.txt


    within one of those files is the path to the related ubam file so for file1.txt. 

    gs://sample1_L1.ubam
    gs://sample1_L2.ubam
    gs://sample1_L3.ubam

    Then have the task merging the ubam accept a txt file and downloads the context of the fileX.txt

    call create_fofn
         {
         input:
             array_of_files
         }

    # Calles merge_ubam for each fofn in array of fofn
    for fofn in array_fofn
        call merge_ubam 
         {
         input:
             array_of_files = read_lines(fofn)
         }

    ################################

    task merge_ubam{
    input
    Array[Files] array_of_files
    command{...}

    }

    task create_fofn{
         input:
             Array[Files]
    # python script in command that organises ubam in respective fofn
    command{...}
    output{Array[Files] = [file1.txt,file2.txt,file*.txt]}

    }

     

    1
    Comment actions Permalink
  • Avatar
    Matthias De Smet

    That’s actually a great idea. I’ll have to try it. I will get back to you tomorrow. Thanks for the tip!
    Matthias

    0
    Comment actions Permalink
  • Avatar
    Matthias De Smet

    To get back to this, awesome idea, worked like a charm!

    1
    Comment actions Permalink
  • Avatar
    Giulio Genovese

    The appropriate way to group/aggregate elements of an array is to use the collect_by_key() function although this is only available in WDL development version. As an example, the following WDL:

    version development

    workflow main {
    input {
    Array[String] flattened_array = ["file1_L1.ubam", "file1_L2.ubam", "file2_L1.ubam", "file2_L2.ubam", "fileX_L1.ubam", "fileX_L2.ubam"]
    }

    scatter (file in flattened_array) {
    String sample_array = basename(basename(file, "_L1.ubam"), "_L2.ubam")
    }

    output {
    Array[Pair[String, Array[String]]] grouped_array = as_pairs(collect_by_key(zip(sample_array, flattened_array)))
    }
    }

    Will output the the following array without invoking a task:

    "main.grouped_array": [{
    "right": ["file1_L1.ubam", "file1_L2.ubam"],
    "left": "file1"
    }, {
    "right": ["fileX_L1.ubam", "fileX_L2.ubam"],
    "left": "fileX"
    }, {
    "right": ["file2_L1.ubam", "file2_L2.ubam"],
    "left": "file2"
    }]

    And it can be easily modified to group files the way needed.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk