Can a (demultiplexing) workflow create new (sample) entities?
We are building a demultiplexing workflow that takes as input a set of FASTQs and a sample sheet from a sequencing run. Each sequencing run is an entity in a "Sequencing Runs" table. The workflow creates a set of demultiplexed files as output. In addition I would like to have a the workflow add new entities to a second table ("Samples"), with one entity for each sample contained in the run. Is there a way to achieve this?
Comments
8 comments
Hi Martin,
Thank you for your question. Let me speak to a colleague about this and get back to you.
Kind regards,
Jason
Hi Martin,
We may need some additional information about the shape of your data to fully understand, but based on your initial message we believe this is the situation you are in:
If you are trying to create a table of individual samples directly from the workflow, we do not believe there is a currently a way to do this. You could conceivably create a properly-formed tsv in your workflow that you would later upload to Terra. See https://support.terra.bio/hc/en-us/articles/360025758392-Managing-data-with-tables- for guidance with the form.
If you are instead trying to write column values to a sample table that already exists, it might be possible to do so if you have associations set up between the sequencing runs table and sample table, so you know which sequence maps to which sample.
Alternatively if you only wanted to write back directly to the sequencing runs table, you can consider re-writing your workflow so that you run it on a sequence level rather than a sequence set level. Terra makes it easy to run workflows in batch—you can run a sequence-level workflow on multiple sequences in parallel, each instance of the workflow running on a single sequence.
I hope this helps. If there is a gap in our understanding, please share your workspace with GROUP_FireCloud-Support@firecloud.org if you are able so we can get a better sense of what your data looks like. Please also feel free to explain anything we don't yet understand precisely.
Kind regards,
Jason
I have only a slight modification of the situation you described:
It sounds like one way to achieve this is to
1) Have the workflow generate a "Samples" table TSV
2) Manually download the TSV
3) Manually upload the TSV using the "Import Table Data" function.
Is it possible to instead use an API to replace steps (2) and (3) and manipulate the samples table from within the workflow?
Thanks!
Hey Martin,
This should be possible using APIs or FISS.
There are essentially two options: TSV import, or calling the entity APIs. Entity APIs are a bit harder to use, but have more power. The TSVs are a bit easier, and if you want something that isn’t a participant, sample, pair, or set of one of those, you need the flexible import option.
If you have any questions, please let us know.
Kind regards,
Jason
Hi Jason,
Great - This looks promising. I'm hopeful we can use APIs or FISS for creating the sample entities. The one challenge I see is that we have to sort out authentication from within a task. The task needs to access the workspace it's currently running from so I think I'd need to
1) find out the workspaceNamespace and the workspaceName. (The name of the entity table to be updated can be hardcoded)
2) have the task somehow authenticate to allow FISS or the API to make the entity table updates.
Are either/both of these possible from within a workflow?
Thanks,
Martin
Hi Martin,
Thanks for your patience here. For point 2, the workflow VMs should already have scope to make API calls without additional authentication steps.
I am awaiting word from one of our technical experts on point 1, and will get back to you as soon as I hear from them.
Kind regards,
Jason
Hi Martin,
At this time the workspace namespace and workspace name are not passed all the way through to the environment variables available in a command block. If you want to try, there may be other unorthodox ways of sniffing out your workspace.
For example: trying to sniff the Google project from gcloud. It might be local or available via Google metadata HTTPS APIs. The project corresponds to the workspace namespace. If oyou only had one workspace per namespace, you could list the singular workspace from api.firecloud.org.
All that may involve a lot of debugging and may not work. As such, it may be easier to instead pass in the variables through the method configuration / WDL inputs.
If you are interested in seeing Cromwell pass in this information in the future, I'm happy to raise a feature request on your behalf.
Kind regards,
Jason
Interesting resource on demultiplexing: https://demultiplex.readthedocs.io/en/latest/usage.html
Please sign in to leave a comment.