Confusion on how to set up read group-level processing in workspaces

March 15, 2024 17:10
1 comment

Hi,

I am a bit confused about best practices for setting up Terra workspace data models to handle processing samples with multiple read groups. The note on this article suggests that the Whole-Genome Analysis Pipeline featured workspace should be the example to use. In that workspace, the "1-WholeGenomeGermlineSingleSample" workflow is configured to run on a read_group_set entity and writes its output to "this", meaning that outputs including the gvcf file are stored in columns in the read_group_set data table. The "2-generate-sample-map" workflow, however, is configured to run on a sample_set entity, and gets its list of gvcfs from this.samples.gvcf_path, which references columns in the sample table.

This means that the outputs of the first workflow are not linked to the inputs of the second workflow. Does the intended usage of these workflows in this workspace require manually copying the gvcf output column from the first workflow from the read_group_set table into a column in the sample table?

What I really think I want is an entity relationship diagram that looks like this:

sample -> read_group_set -> read_group

where the sample table would have a "read_group_set_id" that linked it to the read group set for that sample. Then you could run the "1-WholeGenomeGermlineSingleSample" on a sample root entity and have it reference the list of ubams as "this.read_group_set.read_groups.ubam". Note that there is currently no link from sample to read_group_set in the featured workspace at all.

When I try creating a link like this this in my own workspace, I get an empty list of ubams. Should this work? Is there any other way to set this type of data model up?

Thanks,

Chris

Comments

1 comment

Josh Evans
- March 19, 2024 20:15
Hi Chris,

Thanks for writing in! The exact configuration described in the workflow doesn't have to be followed. That is just shown as an example. That said, it might not be possible for the workflows to link the data model exactly how you suggest.

My suggestion would be to try some test to see if you can get the data to flow how you would like, and if you have any specific questions, I would reach out to the Warp Pipeline Team at warp-pipelines-help@broadinstitute.org as they created the pipelines and will be able to help.

Best,

Josh

0

Please sign in to leave a comment.