Struct Definition and sample_set entity
Hi, I am trying to make a workflow that uses entity `sample_set`, scatters over samples to filter and replace the IDs of their VCFs, and then merges said filtered VCFs.
I'm getting parsing errors for my WDL on FireCloud:
Error: Invalid WDL: ERROR: Finished parsing without consuming all tokens. struct SampleInfo { ^
Here's the struct and workflow definitions from the WDL:
struct SampleInfo {
File vcf
File vcf_ind
String? sample
String? participant
}
workflow FilterMergeVCF {
Float? genotype_probability_override
Float genotype_probability = select_first([genotype_probability_override,0.7])
Array[SampleInfo] inputSamples
scatter (S in inputSamples) {
call FilterData {
input:
querySample = S.sample,
queryParticipant = S.participant,
queryVCF = S.vcf,
queryVCFind = S.vcf_ind,
filterGP = genotype_probability
}
}
call MergeData {
input:
filterGP = genotype_probability,
inputSetVCF=FilterData.filtVCF,
inputSetVCFind=FilterData.filtVCFind
}
}
(FYI `FilterData` uses bcftools to filter individual VCFs and replace the sample ID from sample to participant, and `MergeData` uses bcftools to merge the Array[File] output of VCFs from FilterData.)
The WDL parser seems to have a problem with the struct definition but I set it up just as the WDL spec recommends, I think? Is there an obvious error?
After I get past this parse error, I am also wondering how the inputSamples variable will get handled for entity sample_set on Terra... will it allow me to specify the attributes of the data model for the struct SampleInfo vcf, vcf_ind, sample, and participant?
Thank you,
Kathleen
Comments
6 comments
Ok, update, this appears to have just been a WDL versioning issue. Declaring version 1.0 at the top of the WDL allowed for struct definitions but other syntax had to change like input {} wrappers for workflow and task definitions (which is better looking syntax anyway!).

As far as the input attributes go, this is what I see under Workflows -- unsure what "WomCom..." is but I'll play around with it / upload a JSON instead of using the menu here. If my entity is sample_set, then how do I reference which attributes of the samples to use?
EDIT: It's "WomCompositeType". I tried giving it this input (a spray.json.JsString):
From the error, it seems to need an input of WomCompositeType, though...
How does that work? How do I define this type of input from sample attributes for a sample_set?
I ended up splitting into two workflows, filter VCF workflow for individual samples and a merge VCFs workflow for sample sets, which seems to be the better approach!
But, still might be curious about how to make Arrays of structs from sample sets in the future. Main purpose being to be able to scatter over a set of structures.
Hi Kathleen,
I'll try that out later!
I've run into a different problem with the merge workflow. Trying to get BCFtools to accept my Array[File] for input to merge.
Done in this way, write_lines is making a file that lists the original buckets, not the localized files, so it fails:
Done within the command, it cannot coerce Array[File] to Array[String] for write_lines() to do its magic:
Done as just feeding bcftools the array (no write_lines or sep) and hoping for the best also fails with "Array value was given but no 'sep' attribute was provided".
The second approach, feeding bcftools write_lines(${inputSetVCF}), does indeed make a .tmp file containing a localized input per line. But the workflow errors out there, throwing "Array value was given but no 'sep' attribute was provided" several times.
Hi Kathleen,
write_lines()
is expectingArray[String]
as input but you are providingArray[File]
. What you can do is add in a line at the top of your command that creates file from the Array[Files] you provided and use that file to as input for your other command.Please sign in to leave a comment.