TCGA workspace data sample data dictionary is missing Answered
I am working with the 33 TCGA controlled access workspaces. The 'sample' data model has 39 features (i.e. columns). I need to quantify the original RNA reads, including the unmapped reads using a new reference. I can not find basic information about the data model.
For example, there are 2 columns that look like they might have the data I need 'mRNASeq_bam_path and 'mRNASeq_fastq_path'. Any idea how I can find out more about these files? what are they and where did they come from? For example
have the reads been trimmed?
Do they contain unmapped reads?
What is the difference between the bam and FASTQ versions?
The workspace Dashboard page has links to
- https://cancergenome.nih.gov/abouttcga/overview ,
- https://cancergenome.nih.gov/publications
- bad URL: https://TCGA_data.nci.nih.gov/docs/publications//tcga/datatype.html it probably should have been something like https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables
googling the column names does not return useful results
These links seem like they might be related however they do not line up with the column names
- https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/data-levels
- https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/data-types
Any idea how I can track this information down?
Kind regards
Andy
Comments
3 comments
Looks like you can not edit your original post.
The 'sample' data model has 39 features (i.e. columns). The column names do not seem to line up with anything in the TCGA code tables.
here is an link to one of the workspace
https://app.terra.bio/#workspaces/broad-firecloud-tcga/TCGA_LUAD_ControlledAccess_V1-0_DATA
Hi Andrew Davidson,
Thanks for writing in. We'll take a look at your question and get back to you as soon as we can.
Best,
Samantha
Hi Andrew Davidson,
Unfortunately, Terra Support is not equipped to answer questions specific to the TCGA data files and how they were created. I would suggest reviewing this page about TCGA data types: https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/using-tcga/types. If that doesn't help answer your question, I would recommend reaching out to TCGA or GDC's helpdesks.
TCGA helpdesk: tcga@mail.nih.gov
GDC helpdesk: support@nci-gdc.datacommons.io
Best,
Samantha
Please sign in to leave a comment.