TCGA workspace data sample data dictionary is missing Answered

Post author
Andrew Davidson

I am working with the 33 TCGA controlled access workspaces. The 'sample' data model has 39 features (i.e. columns). I need to quantify the original RNA reads, including the unmapped reads using a new reference. I can not find basic information about the data model. 

For example, there are 2 columns that look like they might have the data I need 'mRNASeq_bam_path and 'mRNASeq_fastq_path'. Any idea how I can find out more about these files? what are they and where did they come from? For example

have the reads been trimmed?

Do they contain unmapped reads?

What is the difference between the bam and FASTQ versions?

 

The workspace Dashboard page has links to

  1. https://cancergenome.nih.gov/abouttcga/overview
  2. https://cancergenome.nih.gov/publications
  3.  bad URL: https://TCGA_data.nci.nih.gov/docs/publications//tcga/datatype.html it probably should have been something like https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables

googling the column names does not return useful results

These links seem like they might be related however they do not line up with the column names

Any idea how I can track this information down?

Kind regards

 

Andy

Comments

3 comments

Please sign in to leave a comment.