Add Array[File] to Workspace Data table

Post author
Justin Rhoades

Hello,

Is it possible to add an Array[File] to the Workspace Data table? I've tried a few different things unsuccessfully and haven't been able to find any documentation that describes how to do this. Thanks!

Comments

12 comments

  • Comment author
    Jason Cerrato

    Hey Justin,

    Thanks for writing in. The easiest way is to build the array from within the Data Table editor. If you upload a TSV with an array it probably turned into an awkward string like this:

    If you click the pencil icon in the cell to edit it, you have an option to select called "Attribute is a list."

    After clicking it, you'll have the option to "Add item" at the bottom, which you can do for each of the items of your array. You'll want to put one path per line, then Save Changes.

    And now the Array presents properly in the data table.

    This works fine for arrays of smaller sizes, but I imagine this wouldn't work great for arrays of larger sizes. Let me see if I can find any other solutions we might already have for fixing this. If we don't, I'll create a Feature Request post you can upvote and comment your support for.

    Kind regards,

    Jason

    0
  • Comment author
    Justin Rhoades

    Hi Jason,

    Thanks for the response. That makes perfect sense for most data tables but where I'm running into problems specifically is in Workspace Data under Other Data. I'm looking to add an array length two so I'm ok doing it manually but I don't seem to have the same options as I do with the tables you described. There is a "String List" option but I can't seem to get the desired array of two files. Hope this clarifies!

    -Justin

    0
  • Comment author
    Jason Cerrato

    Hi Justin,

    Ah, I see! Thanks for clarifying. Let me look into this to see if it's possible and get back to you.

    Kind regards,

    Jason

    0
  • Comment author
    Justin Rhoades

    Hi Jason,

    Thanks for looking into this. Please let me know what you find. This is currently blocking us so if it's not possible we'll have to come up with a workaround.

    Best,

    Justin

    0
  • Comment author
    Jason Cerrato

    Hi Justin,

    Thanks for letting me know. I'll try to get you an answer or workaround before end of day.

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hey Justin,

    I wanted to touch base here since I said I would try to get you an answer or workaround by today. I've confirmed that simply listing out your paths separated by commas does seem to resolve in some way in the Workspace Data variables.

    I ran a test using this WDL I constructed, but I received this error in my log:

    CommandException: "stat" command does not support "file://" URLs. Did you mean to use a gs:// URL?

    This makes me think that Terra interprets file paths in the Workspace Data section as being of type file:// first, and perhaps translates to the underlying gs://afterward. I would need to do a bit more digging to confirm. If you can put a single file path as a String variable in the data table, it seems like it should be possible to define a list of paths and use them in a similar manner.

    One workaround I can think of now is to tar the two files you need to use, then edit your WDL to extract the files prior to running the rest of your code. This would allow you to define the single tar file as a Workspace Data variable.

    I'll keep looking into this to see if I can find a way to get it working with the String List and get back to you as soon as I can.

    Kind regards,

    Jason

    0
  • Comment author
    Justin Rhoades

    Hi Jason,

    Thanks again for the response. I tried the way you described above, simply listing the paths separated by commas, and it worked perfectly. Previously I had tried something similar but also using quotes and square brackets but your method seems to have done the trick. I'm not sure why your test is giving you that error but my task was able to find both files in the array in google buckets. Thanks for the help!

    -Justin

    0
  • Comment author
    Jason Cerrato

    Hi Justin,

    Great to hear! I'll do a little digging to see why mine led to a different result—perhaps it has something to do with the way I've echo'd the lines into a file—but I'm glad to hear setting the paths up in a similar way led to a successful result for you.

    If we can help with anything else, please let us know!

    Kind regards,

    Jason

    0
  • Comment author
    Philipp Hahnel

    Hi,

    I wanted to push this question again. I have the same problem, just for a table with hundreds of rows for which it is unfeasible to do said method by hand. Tar-ing the files doesn't make sense for this many cases. Another workaround I can think of is to write the file names in those arrays into new text files, upload those files, and link the text files in the table instead of the arrays, together with another extraction preamble in the WDL. This seems like a lot of unnecessary effort.

    Any further ideas or progress on that functionality?

    Philipp

    0
  • Comment author
    Emil Furat

    Hi Philipp,

     

    Thanks for reaching out. I'm not sure if I am understanding correctly: did you upload a TSV with an array and it turned into an awkward string, or, are you trying to upload an Array into the "Other Data" section of your notebook? 

    In either case, it sounds like the resolution you have proposed would be a viable workaround.

     

    There has not been any progress made on improving Data Tables, however, I can further advocate for this on your behalf. What kind of solution would work best for your use case? I will be happy to pass along any details you can provide us to help get new/better functionality built into our Data Tables.

     

    Kind regards,

    Emil

     

    0
  • Comment author
    Philipp Hahnel
    • Edited

    Thanks for the response, Emil!

    I've uploaded a TSV which turned a supposed array into "['path1.bam', 'path2.bam']".

    I've meanwhile found this post https://support.terra.bio/hc/en-us/community/posts/360054716892-Importing-TSVs-that-contain-arrays?page=1#community_comment_4409077727387 which drew my attention to the fact that I used single quotation marks to mark strings, which is not JSON-valid. After changing them to double quotation marks ["path1.bam", "path2.bam"] the TSV is now properly imported into array structures. 

    It's not documented in the Sets and arrays of data in the workflow section of https://support.terra.bio/hc/en-us/articles/360025758392 and it would help a lot if a note on that could be added there.

    0
  • Comment author
    Emil Furat

    Hi Philipp,

     

    Thank you for taking the time to explain the solution you found and for suggesting possible improvements to our documentation! I have let the team in charge of maintaining our documentation know that a note about the single quotation mark issue you encountered could be helpful to other users.

     

    Kind regards,

    Emil

    0

Please sign in to leave a comment.