Make workspace environment variables available in workflow configuration

December 04, 2021 14:12
3 comments

We have workflows that need to write outputs to a specific path in the workspace bucket, eg the Cumulus workflow in the Intro to HCA data workspace. This is usually done when the workflow does not attach outputs to a data table (for whatever reason). We don't want to have to go digging through the submission directories to find outputs, so we include a task that copies outputs to a specific path. To make that work, in the current state, the workflow config includes an input variable to collect the workspace bucket ID in order to compose the output path.

That requires the user to look up the workspace bucket ID in the dashboard and paste it into the input field — which is clunky and brittle. Imagine you clone the workspace after the workflow has been configured; now you run it in your clone, it's going to try to write outputs to the parent workspace. If you have write access to the parent, you may not realize you're putting your outputs in a different workspace (and you might overwrite things there), and if you don't have write access, it will fail with a permissions error. Ack.

In Notebooks and RStudio cloud environments, metadata like the workspace bucket ID is available through built-in environment variables which is incredibly convenient.

It would be very useful for the use case above to be able to do something like "workspace.bucket-id" in the workflow config. There is already a "Workspace Data" table that this metadata could live in, it just needs to get populated with the relevant metadata by default. There is already precedent for this, eg workspace tags, which are hidden in the UI but you can see them if you download the csv for that table:

workspace:tags
["HCA","single-cell","Bioconductor","10x Analysis","cumulus","10X Genomics","Jupyter Notebooks","warp-pipelines"]

Bonus points for actually having a way to display built-in workspace env variables that doesn't involve downloading a csv.

Comments

3 comments

Emil Furat
- December 06, 2021 18:39
Hi Geraldine,

Thank you for writing in! I've sent this request to our development team for consideration, and I'll be happy to follow up with you if this feature gets built.

Kind regards,

Emil

1
Yossi Farjoun
- January 12, 2022 15:19
I'd like to second this. I have some workflows that run on "files" (not entities) and they are used to prepare resources for subsequent workflows. The inputs for these workflows can already include variables from the workspace data (using `workspace.<variable_name>`) and if I put `workspace.<output_variable_name>` in the outputs section of the configuration, I get no errors, but it doesn't do what I'm expecting...I.e, a new variable does NOT appear in the workspace data section. It would be quite helpful (and consistent) if that could happen.

thanks!

0
Pamela Bretscher
- January 12, 2022 16:38
Hi Yossi Farjoun

Thank you for your input on this feature request! I'll be happy to follow up with you if there are any updates on this feature.

Kind regards,

Pamela

0

Please sign in to leave a comment.