WDL function read_string() not properly delocalizing paths as files
Hi,
I’m having an issue with my workflow related to delocalization by Cromwell/Terra.
I have a file named metadata.txt which simply contains a 1-line pathname to the actual file I want to delocalize, e.g.
/cromwell_root/fc-secure-ec2ce7e8-339a-47b4-b9d9-34f652cbf41f/469a9ccb-43c5-46e5-8f35-a8827a345735/dropseq_cumulus/d7cc5552-e1e6-4f51-93aa-fcf1c8b510ea/call-cumulus/cumulus.cumulus/90d9eb9e-4ffc-4e4f-b9fa-0198e2629158/call-scp_output/attempt-2/glob-1c77504a8a1d1e9b00f3be9956ddf1c3/sco.scp.metadata.txt
When I doFile cumulus_metadata = read_string("metadata.txt")
Terra claims to delocalize sco.scp.metadata.txt to:
gs://fc-secure-ec2ce7e8-339a-47b4-b9d9-34f652cbf41f/469a9ccb-43c5-46e5-8f35-a8827a345735/dropseq_cumulus/d7cc5552-e1e6-4f51-93aa-fcf1c8b510ea/call-scp_outputs/call-cumulus/cumulus.cumulus/90d9eb9e-4ffc-4e4f-b9fa-0198e2629158/call-scp_output/attempt-2/glob-1c77504a8a1d1e9b00f3be9956ddf1c3/sco.scp.metadata.txt
but the path onwards from call-scp_outputs/ and the file do not exist! Instead it somehow delocalizes the undesired metadata.txt to
gs://fc-secure-ec2ce7e8-339a-47b4-b9d9-34f652cbf41f/469a9ccb-43c5-46e5-8f35-a8827a345735/dropseq_cumulus/d7cc5552-e1e6-4f51-93aa-fcf1c8b510ea/call-scp_outputs/
Why is it doing this? This isn't the intended behavior of read_string or Cromwell's delocalization, is it?
(I know the entire premise is a little weird, but I’m essentially doing all this to serialize an Array[File]
from another person's workflow into separate File variables)
I'll share the Alexandria workspace with GROUP_FireCloud-Support@firecloud.org
The job ID is 469a9ccb-43c5-46e5-8f35-a8827a345735
Thanks
James
Comments
11 comments
Hi James,
Thank you for your inquiry. We'll take a look at this as soon as we can and get back to you.
Kind regards
Jason
Hi James,
Is this metadata.txt file something that's generated in the workflow, or is it a file you already have? Can you point me to where this file is generated/already exists?
Can you also clarify which version of dropseq_cumulus you ran for job submission 469a9ccb-43c5-46e5-8f35-a8827a345735?
Many thanks,
Jason
Hey Jason,
I ran it in alexandria/dropseq_cumulus/1. sco.scp.metadata.txt and other files are generated by Cumulus workflow, but they are wrapped in the Array[File] output_scp_files. In dropseq_cumulus' scp_outputs task, I serialize this Array[File] into separate File variables.
Specifically, I do this by identifying each File and writing the path to that file in another file (e.g. "/cromwell_root/path/to/sco.scp.metadata.txt" is written in file ./metadata.txt). If you read my original post now, it might make more sense.
But overall, this isn't so much a problem anymore as I've been using another solution in alexandria_dev/dropseq_cumulus/11:
First I move the file to the present working directory and then it delocalizes as such:
If anything I just want to call attention to the people working on Terra/Cromwell that this function and the subsequent delocalization does not behave as expected.
Best,
James
Hi James,
Thank you for letting us know that you have found a solution. I will pass the information from your original post on to our workflow engineers for investigation.
Kind regards,
Jason
Hi James,
I'm still having some difficulty finding this exact line in your job submission ID 469a9ccb-43c5-46e5-8f35-a8827a345735, where it claims to delocalize to
gs://fc-secure-ec2ce7e8-339a-47b4-b9d9-34f652cbf41f/469a9ccb-43c5-46e5-8f35-a8827a345735/dropseq_cumulus/d7cc5552-e1e6-4f51-93aa-fcf1c8b510ea/call-scp_outputs/call-cumulus/cumulus.cumulus/90d9eb9e-4ffc-4e4f-b9fa-0198e2629158/call-scp_output/attempt-2/glob-1c77504a8a1d1e9b00f3be9956ddf1c3/sco.scp.metadata.txt
Can you point me to where I can find it? Looking at the scp_outputs task log, I've only found these.
I've only so far found this file sco.scp.metadata.txt in the Localization section of the log.
Many thanks,
Jason
Hi Jason,
So for delocalization in the log, the wrong files are being delocalized. For example, read_string("metadata.txt") should have looked inside metadata.txt for the path to the desired file, e.g. "/cromwell_root/path/to/sco.scp.metadata.txt", and returned that path to become the WDL File output. Instead it delocalized the undesired files, "metadata.txt", "expr.txt", and "X_fitsne_coords.txt", which each only contain the path to the desired file.
In the Terra job's Outputs tab however, Terra claims that the desired files instead were delocalized, but the gs:// links to these files are dead, and lead down nonexistent paths. Terra doesn't mention that "metadata.txt", "expr.txt", and "X_fitsne_coords.txt", were all delocalized instead of "sco.scp.metadata,txt", "sco.scp.expr.txt", and "sco.scp.X_fitsne_coords,txt", so that conflicts with the log file. I'm just as confused as you are on this one.
Sorry for the confusion, I know this is pretty wacky and convoluted.
Best,
James
Hi James,
Would you be willing to share dropseq_cumulus with jcerrato@broadinstitute.org so I can take a closer look at versions 1 and 11?
Kind regards,
Jason
Hi Jason,
The methods should be publicly viewable but I've shared the both tools with you just in case. Hope that helps!
Oh and the dev snapshot that fails is 9, not 11!
James
Hi James,
Many thanks—I was getting a message that the snapshot had been removed or that I didn't have access. I have access now after being added.
Kind regards,
Jason
I accidentally removed dev snapshot 10, sorry! If you want to look at the dev version, make sure you are looking at 9, I believe that is the same as alexandria/dropseq_cumulus/1. I think just looking at alexandria/dropseq_cumulus/1 would be best anyways.
Hi James,
Thank you! Our internal team will take a look and see if there is anything unexpected going on. Thank you for your report. If there's anything else we can help with, please let us know!
Kind regards,
Jason
Please sign in to leave a comment.