I've been struggling to get a workflow to work in Terra. In attempts to narrow down the issue I've ended up with a workflow that looks completely arbitrary, but I promise I came across this issue in something that's actually productive.
The problematic task attempts to define finalDiskSize as a function of a File? input named test. finalDiskSize is used as a runtime attribute.
Locally, this works fine even if File? test does not exist, but on Terra it fails before task execution as Cromwell declares test_size doesn't have a valid value. Attempting to use "if then else" logic in the disk size runtime attribute instead/in addition to "if then else” logic in the calculation of test_size also fails with the same error.
Second Attempt Setup
Although I don't know the specifics, I'm aware localization on a Google backend is different than on a local execution, so I tried this workaround in the workflow section:
Where run_example_wf.wf_magicword is type File, run_example_wf.wf_nonexistent is type File?, and fallback.bogus is type File. The fallback file is a blank text file whose only purpose to prevent the task from erroring out. If it is used as the second element in an select_first() array, it acts as a fallback input should the first element not exist.
In the context of the "checker" workflow, run_example_wf.wf_nonexistent (a workflow level output of run_example_wf) never exists; it is a File? output of run_example_wf that would match the pattern zzyzx.txt, which does not get written at any point of run_example_wf.
Second Attempt Results
What confuses me is the fact that my filecheck task (see first screenshot) errors out again (only on Terra) with this set up, even with the fallback file in the select_first() array. I assume that my first attempt which did not use select_first() may not work as a side effect of how localization works, but my understanding is that select_first() is not functioning correctly in my second attempt. It appears that Cromwell is attempting to localize a file that does not exist even though I am using select_first() to prevent this from happening.
Error attempting to localize file with command: 'mkdir -p '/cromwell_root/fc-4e8db524-9266-47eb-ad44-9b54fee6decd/3f07149a-33ba-4d82-9c36-89c3ec7aa699/checker/4e73aaf9-a1a7-4807-9e2e-c835c9da0953/call-run_example_wf/run_example_wf/385dde28-cf86-44cb-a91b-473fb1242e87/call-one_is_missing/' && rm -f /root/.config/gcloud/gce && gsutil -o 'GSUtil:parallel_thread_count=1' -o 'GSUtil:sliced_object_download_max_components=1' cp 'gs://fc-4e8db524-9266-47eb-ad44-9b54fee6decd/3f07149a-33ba-4d82-9c36-89c3ec7aa699/checker/4e73aaf9-a1a7-4807-9e2e-c835c9da0953/call-run_example_wf/run_example_wf/385dde28-cf86-44cb-a91b-473fb1242e87/call-one_is_missing/zzyzx.txt' '/cromwell_root/fc-4e8db524-9266-47eb-ad44-9b54fee6decd/3f07149a-33ba-4d82-9c36-89c3ec7aa699/checker/4e73aaf9-a1a7-4807-9e2e-c835c9da0953/call-run_example_wf/run_example_wf/385dde28-cf86-44cb-a91b-473fb1242e87/call-one_is_missing/'' CommandException: No URLs matched: gs://fc-4e8db524-9266-47eb-ad44-9b54fee6decd/3f07149a-33ba-4d82-9c36-89c3ec7aa699/checker/4e73aaf9-a1a7-4807-9e2e-c835c9da0953/call-run_example_wf/run_example_wf/385dde28-cf86-44cb-a91b-473fb1242e87/call-one_is_missing/zzyzx.txt grep: /cromwell_root/stderr: No such file or directory
So with that in mind I stopped accounting for the test file entirely in my disk size calculation. Instead, I just assume the test file is about the same size as the truth file and double the size of the truth file. I am still taking in the test file as an optional input, as if it does exist, I want to compare the truth file against the test file. Unfortunately, running this on Terra still attempts to localize the file that does not exist instead of falling back to the fallback file, and the task still fails.
select_first() can work as expected... sometimes?
Here's where things get interesting... I have a second file, wf_never, that does not get created because the task creating it does not ever get called. This workflow generates the workflow-level output wf_always, a File named foo.txt, and wf_never, a File? named bizz.txt... except, of course, wf_never will never be created as its task is never called.
Using select first in that instance passes, even on Terra.
So it seems that select_first() acts as expected (or at least my understanding of it, please correct me if I'm wrong) when the task that would create a File? is never called, but not if the task that would create a File? is called.
Is there a way to estimate disk size at runtime when dealing with a File?, and is there a reason why two different Files? show different behavior with select_first()?
Thanks in advance for your time and sorry for how long this post is. I'm hoping the explanations I gave will save some time in testing.
Please sign in to leave a comment.