Struct Definition and sample_set entity

Hi, I am trying to make a workflow that uses entity `sample_set`, scatters over samples to filter and replace the IDs of their VCFs, and then merges said filtered VCFs.

I'm getting parsing errors for my WDL on FireCloud:

Error: Invalid WDL: ERROR: Finished parsing without consuming all tokens. struct SampleInfo { ^

Here's the struct and workflow definitions from the WDL:

struct SampleInfo {
File vcf
File vcf_ind
String? sample
String? participant
}

workflow FilterMergeVCF {
Float? genotype_probability_override
Float genotype_probability = select_first([genotype_probability_override,0.7])
Array[SampleInfo] inputSamples

scatter (S in inputSamples) {
call FilterData {
input:
querySample = S.sample,
queryParticipant = S.participant,
queryVCF = S.vcf,
queryVCFind = S.vcf_ind,
filterGP = genotype_probability
}
}

call MergeData {
input:
filterGP = genotype_probability,
inputSetVCF=FilterData.filtVCF,
inputSetVCFind=FilterData.filtVCFind
}
}

(FYI `FilterData` uses bcftools to filter individual VCFs and replace the sample ID from sample to participant, and `MergeData` uses bcftools to merge the Array[File] output of VCFs from FilterData.)

The WDL parser seems to have a problem with the struct definition but I set it up just as the WDL spec recommends, I think? Is there an obvious error?

After I get past this parse error, I am also wondering how the inputSamples variable will get handled for entity sample_set on Terra... will it allow me to specify the attributes of the data model for the struct SampleInfo vcf, vcf_ind, sample, and participant?

Thank you,

Kathleen

Comments

6 comments

  • Comment author
    Kathleen Morrill
    • Edited

    Ok, update, this appears to have just been a WDL versioning issue. Declaring version 1.0 at the top of the WDL allowed for struct definitions but other syntax had to change like input {} wrappers for workflow and task definitions (which is better looking syntax anyway!).

    As far as the input attributes go, this is what I see under Workflows -- unsure what "WomCom..." is but I'll play around with it / upload a JSON instead of using the menu here. If my entity is sample_set, then how do I reference which attributes of the samples to use?

    EDIT: It's "WomCompositeType". I tried giving it this input (a spray.json.JsString):

    "{vcf: this.vcf, vcf_ind: this.vcf_ind, sample: this.sample_id, participant: this.participant}"

    From the error, it seems to need an input of WomCompositeType, though...

    { vcf -> File vcf_ind -> File sample -> String participant -> String }

    How does that work? How do I define this type of input from sample attributes for a sample_set?

    0
  • Comment author
    Kathleen Morrill

    I ended up splitting into two workflows, filter VCF workflow for individual samples and a merge VCFs workflow for sample sets, which seems to be the better approach!

    But, still might be curious about how to make Arrays of structs from sample sets in the future. Main purpose being to be able to scatter over a set of structures.

    0
  • Comment author
    Anika Das

    Hi Kathleen, 

    Can you try quoting the keys in the json objects? Instead of:
    {
    vcf: this.vcf,
    vcf_ind: this.vcf_ind,
    sample: this.sample_id,
    participant: this.participant
    }
    try:
    {
    "vcf": this.vcf,
    "vcf_ind": this.vcf_ind,
    "sample": this.sample_id,
    "participant": this.participant
    }
    (Newlines added for legibility here, but I don’t know if Terra-UI allows them.)

    Let us know if you have any other questions!
     
    Best, 
    Anika
     
    0
  • Comment author
    Kathleen Morrill

    I'll try that out later!

    I've run into a different problem with the merge workflow. Trying to get BCFtools to accept my Array[File] for input to merge.

    Done in this way, write_lines is making a file that lists the original buckets, not the localized files, so it fails:

    task MergeData {
    input {
    Array[File] inputSetVCF
    Array[File] inputSetVCFind
    File inputSetVCFlist = write_lines(inputSetVCF)
    }

    command {
    ../bin/bcftools-1.9/bcftools merge -l ${inputSetVCFlist} --force-samples --threads ${num_threads} -Oz -o 'DogAgingProject_${setID}_gp-${filterGP}.vcf.gz'
    }

    Done within the command, it cannot coerce Array[File] to Array[String] for write_lines() to do its magic:

    task MergeData {
    input {
    Array[File] inputSetVCF
    Array[File] inputSetVCFind
    }

    command {
    ../bin/bcftools-1.9/bcftools merge -l write_lines(${inputSetVCF}) --force-samples --threads ${num_threads} -Oz -o 'DogAgingProject_${setID}_gp-${filterGP}.vcf.gz'
    }

    Done as just feeding bcftools the array (no write_lines or sep) and hoping for the best also fails with "Array value was given but no 'sep' attribute was provided".

    0
  • Comment author
    Kathleen Morrill

    The second approach, feeding bcftools write_lines(${inputSetVCF}), does indeed make a .tmp file containing a localized input per line. But the workflow errors out there, throwing "Array value was given but no 'sep' attribute was provided" several times.

    0
  • Comment author
    Anika Das

    Hi Kathleen, 

    You mentioned in your question that write_lines() is expecting Array[String] as input but you are providing Array[File]. What you can do is add in a line at the top of your command that creates file from the Array[Files] you provided and use that file to as input for your other command. 
    command {
    echo "~{sep="\n" inputSetVCF}" > list.txt
    ../bin/bcftools-1.9/bcftools merge -l list.txt --force-samples ...
    Please let us know if you have further question!
     
    Best, 
    Anika
    0

Please sign in to leave a comment.