Need Help?

Search our documentation and community forum

Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.
Terra powers important scientific projects like FireCloud, AnVIL, and BioData Catalyst. Learn more.

Can a (demultiplexing) workflow create new (sample) entities?

Comments

7 comments

  • Avatar
    Jason Cerrato

    Hi Martin,

    Thank you for your question. Let me speak to a colleague about this and get back to you.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Martin,

    We may need some additional information about the shape of your data to fully understand, but based on your initial message we believe this is the situation you are in:

    • There is a Sequencing Runs table where each row is a single sequence
    • Your workflow requires you to run on a set of sequences
    • You want to run the workflow and generate or add to a sample table

    If you are trying to create a table of individual samples directly from the workflow, we do not believe there is a currently a way to do this. You could conceivably create a properly-formed tsv in your workflow that you would later upload to Terra. See https://support.terra.bio/hc/en-us/articles/360025758392-Managing-data-with-tables- for guidance with the form.

    If you are instead trying to write column values to a sample table that already exists, it might be possible to do so if you have associations set up between the sequencing runs table and sample table, so you know which sequence maps to which sample. 

    Alternatively if you only wanted to write back directly to the sequencing runs table, you can consider re-writing your workflow so that you run it on a sequence level rather than a sequence set level. Terra makes it easy to run workflows in batch—you can run a sequence-level workflow on multiple sequences in parallel, each instance of the workflow running on a single sequence.

    I hope this helps. If there is a gap in our understanding, please share your workspace with GROUP_FireCloud-Support@firecloud.org if you are able so we can get a better sense of what your data looks like. Please also feel free to explain anything we don't yet understand precisely.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Martin Aryee

    I have only a slight modification of the situation you described:

    • There is a Sequencing Runs table where each row is a single sequencing run (corresponding to a set of FASTQs). Each sequencing run contains data from ~10-100 individual samples.
    • The workflow will be run on a single sequencing run at a time
    • The workflow will identify all the samples included in the sequencing run and (ideally) append a line for each of them to a Samples table

    It sounds like one way to achieve this is to

    1) Have the workflow generate a "Samples" table TSV
    2) Manually download the TSV
    3) Manually upload the TSV using the "Import Table Data" function. 

    Is it possible to instead use an API to replace steps (2) and (3) and manipulate the samples table from within the workflow?

     

    Thanks!

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hey Martin,

    This should be possible using APIs or FISS

    There are essentially two options: TSV import, or calling the entity APIs. Entity APIs are a bit harder to use, but have more power. The TSVs are a bit easier, and if you want something that isn’t a participant, sample, pair, or set of one of those, you need the flexible import option.

    If you have any questions, please let us know.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Martin Aryee

    Hi Jason,

    Great - This looks promising. I'm hopeful we can use APIs or FISS for creating the sample entities. The one challenge I see is that we have to sort out authentication from within a task. The task needs to access the workspace it's currently running from so I think I'd need to

    1) find out the workspaceNamespace and the workspaceName. (The name of the entity table to be updated can be hardcoded)

    2) have the task somehow authenticate to allow FISS or the API to make the entity table updates.

    Are either/both of these possible from within a workflow?

    Thanks,

    Martin

     

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Martin,

    Thanks for your patience here. For point 2, the workflow VMs should already have scope to make API calls without additional authentication steps.

    I am awaiting word from one of our technical experts on point 1, and will get back to you as soon as I hear from them.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Martin,

    At this time the workspace namespace and workspace name are not passed all the way through to the environment variables available in a command block. If you want to try, there may be other unorthodox ways of sniffing out your workspace.

    For example: trying to sniff the Google project from gcloud. It might be local or available via Google metadata HTTPS APIs. The project corresponds to the workspace namespace. If oyou only had one workspace per namespace, you could list the singular workspace from api.firecloud.org.

    All that may involve a lot of debugging and may not work. As such, it may be easier to instead pass in the variables through the method configuration / WDL inputs.

    If you are interested in seeing Cromwell pass in this information in the future, I'm happy to raise a feature request on your behalf.

    Kind regards,

    Jason

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk