hg19 and hg38 TCGA workspaces
Hi!
I was wondering if there is any documentation on the curation of the hg19 and hg38 controlled-access TCGA workspaces.
I have found some samples in the GDC legacy archive that aren't listed in the hg19 workspace. Are there samples not included due to bad QC metrics? Were they not included because they weren't sequenced at the broad?
All the best,
Sabrina Camp
-
Also on another note, I am trying to use bam files from the hg38 cohort and see this note on the dashboard of the workspace "hg38 TCGA and TARGET workspaces reference files by their GDC UUIDs. In order to run analyses on the referenced data files, you will need to run workflows that retrieve the files from the GDC and copy them to your workspace bucket. See this forum post for instructions on the running of these workflows."
The link to the forum post (https://gatkforums.broadinstitute.org/firecloud/discussion/10382/populating-hg38-tcga-and-target-workspaces-with-data-files#latest) is broken. Is there an updated link? -
Hi Sabrina Camp,
Thanks for writing in. Unfortunately, we do not have any documentation on the curation of the hg19 and hg38 controlled-access TCGA workspaces. The CGA team did the original pull of the data, but have since given up ownership of the workspaces. It seems that getting the hg19 data which is now in legacy archive was really complicated to parse programmatically because metadata was not always homogeneously present. It may just be a result of QC metrics or something else along the lines of not enough or not correctly formatted metadata.
To your second question, the link to the forum post points to our legacy GATK forum which is now defunct. We do not have an updated link, but are working on new documentation for these hg38 and hg19 workspaces. I will be sure to keep you updated on that.
Please let me know if you have any questions.
Best,
Samantha
Please sign in to leave a comment.
Comments
2 comments