WDL function read_string() not properly delocalizing paths as files

Post author
James Gatter
Hi,
 
I’m having an issue with my workflow related to delocalization by Cromwell/Terra.
 
I have a file named metadata.txt which simply contains a 1-line pathname to the actual file I want to delocalize, e.g.
/cromwell_root/fc-secure-ec2ce7e8-339a-47b4-b9d9-34f652cbf41f/469a9ccb-43c5-46e5-8f35-a8827a345735/dropseq_cumulus/d7cc5552-e1e6-4f51-93aa-fcf1c8b510ea/call-cumulus/cumulus.cumulus/90d9eb9e-4ffc-4e4f-b9fa-0198e2629158/call-scp_output/attempt-2/glob-1c77504a8a1d1e9b00f3be9956ddf1c3/sco.scp.metadata.txt
When I do
File cumulus_metadata = read_string("metadata.txt")

Terra claims to delocalize sco.scp.metadata.txt to:

gs://fc-secure-ec2ce7e8-339a-47b4-b9d9-34f652cbf41f/469a9ccb-43c5-46e5-8f35-a8827a345735/dropseq_cumulus/d7cc5552-e1e6-4f51-93aa-fcf1c8b510ea/call-scp_outputs/call-cumulus/cumulus.cumulus/90d9eb9e-4ffc-4e4f-b9fa-0198e2629158/call-scp_output/attempt-2/glob-1c77504a8a1d1e9b00f3be9956ddf1c3/sco.scp.metadata.txt

but the path onwards from call-scp_outputs/ and the file do not exist! Instead it somehow delocalizes the undesired metadata.txt to

gs://fc-secure-ec2ce7e8-339a-47b4-b9d9-34f652cbf41f/469a9ccb-43c5-46e5-8f35-a8827a345735/dropseq_cumulus/d7cc5552-e1e6-4f51-93aa-fcf1c8b510ea/call-scp_outputs/
Why is it doing this? This isn't the intended behavior of read_string or Cromwell's delocalization, is it?
 

(I know the entire premise is a little weird, but I’m essentially doing all this to serialize an Array[File] from another person's workflow into separate File variables)

I'll share the Alexandria workspace with GROUP_FireCloud-Support@firecloud.org

The job ID is 469a9ccb-43c5-46e5-8f35-a8827a345735

Thanks

James

Comments

11 comments

  • Comment author
    Jason Cerrato

    Hi James,

    Thank you for your inquiry. We'll take a look at this as soon as we can and get back to you.

    Kind regards

    Jason

    0
  • Comment author
    Jason Cerrato

    Hi James,

    Is this metadata.txt file something that's generated in the workflow, or is it a file you already have? Can you point me to where this file is generated/already exists?

    Can you also clarify which version of dropseq_cumulus you ran for job submission 469a9ccb-43c5-46e5-8f35-a8827a345735?

    Many thanks,

    Jason

    0
  • Comment author
    James Gatter
    • Edited

    Hey Jason,

    I ran it in alexandria/dropseq_cumulus/1. sco.scp.metadata.txt and other files are generated by Cumulus workflow, but they are wrapped in the Array[File] output_scp_files. In dropseq_cumulus' scp_outputs task, I serialize this Array[File] into separate File variables.

    Specifically, I do this by identifying each File and writing the path to that file in another file (e.g. "/cromwell_root/path/to/sco.scp.metadata.txt" is written in file ./metadata.txt). If you read my original post now, it might make more sense.

    But overall, this isn't so much a problem anymore as I've been using another solution in alexandria_dev/dropseq_cumulus/11:

    First I move the file to the present working directory and then it delocalizes as such:

    File file = glob("*filename.txt")[0] # Delocalizes the first file that matches the glob pattern

    If anything I just want to call attention to the people working on Terra/Cromwell that this function and the subsequent delocalization does not behave as expected.

    Best,

    James

    0
  • Comment author
    Jason Cerrato

    Hi James,

    Thank you for letting us know that you have found a solution. I will pass the information from your original post on to our workflow engineers for investigation.

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hi James,

    I'm still having some difficulty finding this exact line in your job submission ID 469a9ccb-43c5-46e5-8f35-a8827a345735, where it claims to delocalize to

    gs://fc-secure-ec2ce7e8-339a-47b4-b9d9-34f652cbf41f/469a9ccb-43c5-46e5-8f35-a8827a345735/dropseq_cumulus/d7cc5552-e1e6-4f51-93aa-fcf1c8b510ea/call-scp_outputs/call-cumulus/cumulus.cumulus/90d9eb9e-4ffc-4e4f-b9fa-0198e2629158/call-scp_output/attempt-2/glob-1c77504a8a1d1e9b00f3be9956ddf1c3/sco.scp.metadata.txt

    Can you point me to where I can find it? Looking at the scp_outputs task log, I've only found these.

     

    I've only so far found this file sco.scp.metadata.txt in the Localization section of the log.

     

    Many thanks,

    Jason

    0
  • Comment author
    James Gatter
    • Edited

    Hi Jason,

    So for delocalization in the log, the wrong files are being delocalized. For example, read_string("metadata.txt") should have looked inside metadata.txt for the path to the desired file, e.g. "/cromwell_root/path/to/sco.scp.metadata.txt", and returned that path to become the WDL File output. Instead it delocalized the undesired files, "metadata.txt", "expr.txt", and "X_fitsne_coords.txt", which each only contain the path to the desired file.

    In the Terra job's Outputs tab however, Terra claims that the desired files instead were delocalized, but the gs:// links to these files are dead, and lead down nonexistent paths. Terra doesn't mention that "metadata.txt", "expr.txt", and "X_fitsne_coords.txt", were all delocalized instead of "sco.scp.metadata,txt", "sco.scp.expr.txt", and "sco.scp.X_fitsne_coords,txt", so that conflicts with the log file. I'm just as confused as you are on this one.

    Sorry for the confusion, I know this is pretty wacky and convoluted.

    Best,

    James

    0
  • Comment author
    Jason Cerrato

    Hi James,

    Would you be willing to share dropseq_cumulus with jcerrato@broadinstitute.org so I can take a closer look at versions 1 and 11?

    Kind regards,

    Jason

    0
  • Comment author
    James Gatter
    • Edited

    Hi Jason,

    The methods should be publicly viewable but I've shared the both tools with you just in case. Hope that helps!

    Oh and the dev snapshot that fails is 9, not 11!

    James

    0
  • Comment author
    Jason Cerrato

    Hi James,

    Many thanks—I was getting a message that the snapshot had been removed or that I didn't have access. I have access now after being added.

    Kind regards,

    Jason

    0
  • Comment author
    James Gatter
    • Edited

    I accidentally removed dev snapshot 10, sorry! If you want to look at the dev version, make sure you are looking at 9, I believe that is the same as alexandria/dropseq_cumulus/1. I think just looking at alexandria/dropseq_cumulus/1 would be best anyways.

    0
  • Comment author
    Jason Cerrato

    Hi James,

    Thank you! Our internal team will take a look and see if there is anything unexpected going on. Thank you for your report. If there's anything else we can help with, please let us know!

    Kind regards,

    Jason

    0

Please sign in to leave a comment.