Only copy outputs to data model for conditional call
I've written a workflow (entity: sample) that downloads data from our partner sequencing platform. This workflow has a conditional call, namely...
Boolean download = (sample_status == "succeeded" && sample_availability != "archived" && data_ingested != "TRUE")
if (download) {
call Download { etc }
}
...where the Task "Download" has optional outputs like File vcf and File bam.
If the data has been archived by the platform or already downloaded, then the conditional is meant to prevent the call, as the data buckets have already been linked for that sample in the data model. The goal is to prevent accidental overwriting the data model when a sample already downloaded gets inadvertently submitted to this workflow. However, the undesired behavior is happening: even when the task is not called, the output is copied to the data model, overwriting the bucket link to blank. What could be a good workaround or solution?
Thank you!
Comments
7 comments
Hi Kathleen Morrill,
Thanks for reaching out. Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.
Let us know the workspace name, as well as the relevant submission and workflow IDs. We’ll be happy to take a closer look as soon as we can!
Best,
Samantha
Thanks Samantha, I have added the support email to the workspace.
Here is the workspace name: Dog Aging Project - Sequencing Data
The following submission has examples of samples for which the conditional is false and I do not want outputs copied to the data model:
a255de8c-14c6-4bb9-8c81-bc62fd4b679e
The workflows are as follows:
DAP_SequencingData_Status checks the website for sample assignments and the sequencing platform, and updates the data model.
DAP_SequencingData_Ingestion checks whether data…
1. is ready to download (sample_status == "succeeded")
2. is available to download (sample_availability != "archived"
3. is not already downloaded (data_ingested != "TRUE")) ...if so, then return Boolean download = true
If download is true, then the download task occurs with outputs to the data model.
I suppose I could make a task that takes in the data model attributes and re-outputs them if (!download). If they are bucket links (File), then should they also be input as File, or as String? I don't actually want the data localized or copied.
Hi Kathleen Morrill,
Sorry for the delayed response. I brought this to our engineers to confirm the behavior you were seeing. Unfortunately, there's no way in the data model to prevent overwriting if the result of the workflow output is NULL. I'd be happy to create a feature request for this for the team to consider.
For now, you could try your proposed workaround of creating another task that inputs the current data model attributes and re-outputs them if(!download). They should be input as String.
Please let me know if you have any questions.
Best,
Samantha
Thank you!
Another suggested feature request, which might help with the same goal in mind, is the ability to sort the data model by multiple columns when selecting samples for a sample set. That way, we can easily select samples fulfilling multiple conditions for workflow submission (e.g. new sample, successful sequencing, not yet run).
Also, this is probably a less ideal solution than the input->output way I mentioned, but is there a way to force a workflow failure after a boolean condition? I noticed that failed workflows do not output anything to the data model.
So, I have tried the solution of requesting the current data model attributes as outputs for a task that runs only if !download. However, I've found that the successful workflow will attempt to update the data model with the outputs from both tasks (if(download) and if(!download)) -- which, are the same attributes -- even when only one task runs, setting the outputs for the un-run task to NULL and overwriting the outputs from the other.
I have an example of this behavior for the following run:
Sample 31020061513478
Submission 03f0ac71-8e93-400c-9b15-73f1bf6a0590
Workflow 71c915c8-93eb-4441-a574-5bc28a577214
vcf and bam got set from GencoveAPI_Download outputs but fastqr1 and fastqr2 got set from ReturnModel outputs (null), even though fastqr1 and fastqr2 had valid outputs from GencoveAPI_Download.
Found a solution! Rather than inputting existing attributes and outputting those, I set the Task for if(!download) to run the command `exit 1`, forcing a fail state for the workflow. Failed workflows don't output to the data model, so that's just what I needed.
Hi Kathleen Morrill,
Glad to hear you were able to get it to work, and thank you for sharing your solution here. I'll submit feature requests for this and the data table sorting functionality you mentioned.
If you need assistance with anything else, please don't hesitate to reach out!
Best,
Samantha
Please sign in to leave a comment.