Below is a summary of Cancer Research Data Commons (CRDC) and National Cancer Institute (NCI) cloud resources, including the FireCloud data analysis and sharing platform, and how they meld together to advance cancer research. For more comprehensive information, see the CRDC website.
Terms and helpful links
“The vision for the Cancer Research Data Commons (CRDC) is a virtual, expandable infrastructure that provides secure access to many different data types across scientific domains, allowing users to analyze, share, and store results, leveraging the storage and elastic compute, or ability to easily scale resources, of the cloud. The ability to combine diverse data types and perform cross-domain analysis of large data sets can lead to new discoveries in cancer prevention, treatment and diagnosis, and supports the goals of precision medicine and the Cancer Moonshot.”
The Cancer Research Data Commons (CRDC) is the infrastructure that enables access to NCI-funded programs including TARGET, TCGA, and CPTAC through the Genomics Data Commons, and Proteomic Data Commons.
The amount of data types in the CRDC is continuing to grow! See this article for instructions on how to access these data.
The objective is to provide secure access to the data and a platform for analysis within the CRDC, in a cloud native environment. All controlled access data can only be accessed via dbGAP approval. Users choose which platform (ISB, SBG, or Broad) best suits their research needs.
An important piece of the NCI cloud resources is FireCloud powered by Terra. FireCloud is a cloud-based platform developed by the Broad Institute that facilitates secure access to and storage of sensitive data, analysis in both interactive/granular and batch/bulk modes, and collaborative sharing of data, tools, and results.
“The NCI Cloud Resources are components of the NCI Cancer Research Data Commons that bring data and computational power together to enable cancer research and discovery.
These cloud-based platforms eliminate the need for researchers to download and store extremely large data sets by allowing them to bring analysis tools to the data in the cloud, instead of the traditional process of bringing the data to the tools on local hardware. The Cloud Resources also provide access to on-demand computational capacity to analyze these data. The Cloud Resources allow users to run best practice tools and pipelines already implemented or upload their own data or analysis methods to workspaces.
All three Cloud Resources provide support for data access through a web-based user interface in addition to programmatic access to analytic tools and workflows, and the capability of sharing results with collaborators. Each Cloud Resource is continually developing new functionality to improve the user experience and add new tools for researchers.”
Data Commons Framework (DCF)
The core set of principles upon which the CRDC is built. These include:
- Build with the input and collaboration of the broad research community
- Build in an open and modular way to make components extendable and reusable
- To ensure broad interoperability, base the Data Commons on standards developed by coalitions, such as:
- The Global Alliance for Genomics and Health (GA4GH)
- Digital Imaging and Communications in Medicine (DICOM)
- Clinical Data Interchange Standards Consortium (CDISC)
- Adhere to FAIR principles of data stewardship: Findable, Accessible, Interoperable, and Reusable