Workflow input JSON array with length of one is converted to scalar Completed

Post author
Jennings Zhang

We are trying to use the ExomeGermlineSingleSample WARP pipeline,which accepts a JSON array of strings for the input field ExomeGermlineSingleSample.sample_and_unmapped_bams.flowcell_unmapped_bams

Usually samples have multiple uBAM files but for some, there is a single uBAM. We specify the input to be this.participant, which should in theory expand to

["gs://something/patient.bam"]

Unfortunately Terra is converting this to a scalar, we get this error

Workflow input processing failed (Caused by [reason 1 of 1]: Failed to evaluate input 'sample_and_unmapped_bams' (reason 1 of 1): Error(s): No coercion defined from '"gs://.../BamToUnmappedBams/.../call-SortSam/shard-0/....bam.unmapped.bam"' of type 'spray.json.JsString' to 'Array[File]'.)

This anti-feature would seem like it's by intentional design. Our other runs in the same job which have multiple uBAMs per sample work as expected, only the run with one uBAM crashes. Is there a clean way to work around this?

Comments

10 comments

  • Comment author
    Jason Cerrato
    • Official comment

    Hi Jennings Zhang and Gabriel Goodney,

    A fix for this issue went out with yesterday's release. An array with a single value should work when passed in with the normal workflow configuration syntax, without needing to be wrapped in an array.

    If you run into any trouble, please let us know.

    Kind regards,

    Jason

  • Comment author
    Jason Cerrato

    Hi Jennings Zhang,

    Thank you for writing in about this issue. Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.

    1. Add GROUP_FireCloud-Support@firecloud.org to the User email field and press enter on your keyboard.
    2. Click Save.

    Let us know the workspace name, as well as the relevant submission and workflow IDs. We’ll be happy to take a closer look as soon as we can.

    Kind regards,

    Jason

    0
  • Comment author
    Jennings Zhang

    Thanks for helping out Jason Cerrato! I am not sure I am allowed to share the workspace, and how exactly it'll make it through the hoops of authorized domains. Though I can ask my superiors.

    I can, however, do my best to provide steps to reproduce.

    1. The workflow's tutorial is found here: https://app.terra.bio/#workspaces/warp-pipelines/Exome-Analysis-Pipeline
    2. Clone the workspace
    3. There are two rows in the "read_group_set" table, delete one of the rows (so that the column flowcell_unmapped_bams will have only one row)
    4. Execute the "1-ExomeGermlineSingleSample" worfklow
    0
  • Comment author
    Jason Cerrato

    Hi Jennings Zhang,

    Thank you for those steps. I was able to reproduce the issue and I'll be happy to bring this to the attention of our engineers tomorrow. Can you let us know the urgency with which you need a workaround for this issue? Are you currently blocked by this.

    Many thanks,

    Jason

    0
  • Comment author
    Jennings Zhang

    Jason Cerrato it is of no urgency.

    Come to, we would figure out a temporary client-side workaround.

    0
  • Comment author
    Jason Cerrato

    Hey Jennings Zhang,

    If you're interested, you can follow this JIRA ticket created by our engineers to track work toward resolving this bug: https://broadworkbench.atlassian.net/browse/BW-678

    I'll also follow up on this thread if I get word that the bug has been resolved!

    Kind regards,

    Jason

    0
  • Comment author
    Gabriel Goodney

    Hey Jason,

    Has this been resolved or is there a workaround for this issue? I am having the same exact problem when running the Whole Genome Analysis Pipeline. Any help or information would be appreciated.

    Thanks,

    Gabe

    0
  • Comment author
    Jason Cerrato

    Hi Gabe,

    Thanks for writing in. This has not yet been resolved but I am happy to check with the team to see if there is a known good workaround. I'll get back to you as soon as I can.

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hi Gabriel Goodney,

    We have a known good workaround for cases where you know the list/array only has one element. The workaround is to surround the reference with square brackets so that Cromwell knows to interpret the single element as an array.

    Here is an example of how you define the sample_and_unmapped_bams variable for the WholeGenomeGermlineSingleSample best practices workflow.

    { 
       "sample_name": this.read_group_set_id, 
       "base_file_name": this.read_group_set_id, 
       "flowcell_unmapped_bams": [this.read_groups.flowcell_unmapped_bams_list], 
       "final_gvcf_base_name": this.read_group_set_id, 
       "unmapped_bam_suffix": ".bam" 
    }
    instead of
    { 
       "sample_name": this.read_group_set_id, 
       "base_file_name": this.read_group_set_id, 
       "flowcell_unmapped_bams": this.read_groups.flowcell_unmapped_bams_list, 
       "final_gvcf_base_name": this.read_group_set_id, 
       "unmapped_bam_suffix": ".bam" 
    }
    Note that this only works for cases where the list/array is known to have a single element. If you use the square brackets around the list reference and there is more than one element, the job will fail.

    The workflows team will be discussing this scalar issue to see if we can identify a good permanent solution for th problem. For now, I hope this workaround helps you move forward with your analysis!

    Kind regards,

    Jason

    0
  • Comment author
    Jennings Zhang

    Jason Cerrato thank you for letting me know! We will try it out.

    0

Please sign in to leave a comment.