write_lines()/write_map()/write_tsv()/write_json() fail when run in a workflow rather than in a task

Post author
Giulio Genovese

I have some code that tries to generate a table as a file and then send that file to a task. The code looks like this:
File? reheader_file = if reheader then write_map(as_map(zip(barcode, sample_ids))) else None

However, when I run the workflow in Terra I get an error like this:
Failed to evaluate 'reheader_file' (reason 1 of 1): Evaluating if (reheader) then write_map(as_map(zip(barcode, sample_ids))) else None failed: Failed to write_map(...) (reason 1 of 1): java.lang.IllegalArgumentException: Could not build the path "write_map_b4c827dcbcabc48a78ae8ece4b7e6d2d.tmp". It may refer to a filesystem not supported by this instance of Cromwell. Supported filesystems are: HTTP, Google Cloud Storage, DRS. Failures: HTTP: write_map_b4c827dcbcabc48a78ae8ece4b7e6d2d.tmp does not have an http or https scheme (IllegalArgumentException) Google Cloud Storage: Path "write_map_b4c827dcbcabc48a78ae8ece4b7e6d2d.tmp" does not have a gcs scheme (IllegalArgumentException) DRS: write_map_b4c827dcbcabc48a78ae8ece4b7e6d2d.tmp does not have a drs scheme. (IllegalArgumentException) Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems

I get a sense that Cromwell on Terra is unable to serialize files. However, this is perfectly valid WDL code and it runs perfectly fine on my Cromwell instance running on my laptop. How am I supposed to work around this limitation?

Comments

4 comments

  • Comment author
    Jason Cerrato

    Hi Giulio,

    I'll take a closer look at this and get back to you as soon as I can.

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace (see the icon with the three dots at the top-right)?

    1. Add GROUP_FireCloud-Support@firecloud.org to the User email field and press Enter on your keyboard
    2. Click Save

    Let me know the workspace name, as well as the relevant submission and workflow IDs. If there are any authorization domains, please add jcerrato@broadinstitute.org to them if possible. If not, please still provide the relevant submission and workflow IDs.

    Many thanks,

    Jason

    0
  • Comment author
    Giulio Genovese

    Hi Jason,

    So Chris Whelan has explained to me that this is the result of a deliberate configuration of Cromwell on Terra that does not allow to serialize tables on the machine that runs the Cromwell server. It was explained to me that it used to be possible but someone abused it and caused the server to crash in the past and since then it is not allowed. However, as reasonable as this sounds, this breaks the WDL specification which does not restrict the use of write_map() outside of the workflow space.

    I have resolved my own issue by writing a simple task equivalent of write_map():

    task write_map_task {
    input {
    Map[String, String] map
    String docker
    }

    command <<<
    >>>

    output {
    File map_file = write_map(map)
    }

    runtime {
    docker: docker
    }
    }

    However, I don't understand why I have to modify my workflow, which was written according to the WDL specification, to accommodate Cromwell breaking the specification. Wouldn't it make more sense for Cromwell on Terra to automatically dispatch write_lines()/write_map()/write_tsv()/write_json() as separate tasks (maybe only when the input is large enough) so that developers can avoid this additional setback?

    Giulio

    0
  • Comment author
    Chris Llanwarne

    Hi Giulio - 

     

    I tried to answer this for you in your other post (https://support.terra.bio/hc/en-us/community/posts/360071476431-Terra-fails-to-delocalize-files-listed-through-read-lines-?page=1#community_comment_360011392571) but wanted to link here in case others come across your question. If you have follow ups, we can continue in that thread.

    Thanks - 

    Chris

    ---

    Copy of the relevant part of the answer from the other thread:

     

    (ii) and (iii) are side-effects of Cromwell's first cloud backend being JES (now named PAPIv1 - the Pipelines API on Google cloud). In PAPIv1 the request to the API needs to specify the spec of the VM we want to run on, which files to localize at the start, which script to run, and which files to delocalize at the end. PAPIv1 then deletes the VM and any files we didn't record ahead of time to rescue before letting us know that it did what we asked it to do. That model makes writing simple jobs in PAPIv1 easy, but doesn't fit well with file outputs being defined as functions of other file outputs (we can't predict the result ahead of time), nor file outputs being optional. HOWEVER: now that we're operating in PAPIv2 we do have the opportunity to refactor some of that localization/delocalization logic to happen on the VM itself after the job completes, rather than having to predict it ahead of time in the Cromwell engine. We have tickets in our backlog to do just that.

    0

Please sign in to leave a comment.