Terra Data Repository (TDR): Glossary of terms

Derek Caetano-Anolles
  • Updated

The Terra Data Repository (TDR) is a platform designed to make it easier for dataset owners to share large datasets, and for researchers to access them.

This document is meant as a helpful glossary for those who are new to cloud computing and TDR. It is by no means comprehensive, so if you require more information about TDR please take a look at the Terra Data Repository (TDR) Overview, as well the Glossary of terms related to cloud-based genomics for more specific cloud-based terminology.

Glossary

  1. Admin
  2. Asset
  3. Billing Profile
  4. Custodian
  5. Dataset
  6. Dataset schema
  7. Discoverer
  8. Ingest
  9. Owner
  10. Reader
  11. Snapshot
  12. Snapshot creator
  13. Steward
  14. User

Admin (role)

An owner of a Data Repository. Admins can assign Steward roles to other users if the original owners are no longer available. Admins are specifically trained for this role.

Asset

A set of data and metadata that always goes together. An example of this might be a BAM file, some sequencing quality metrics, information about tissue type, and information about the donor for a particular biosample. Data owners have fine-grained control over access by configuring which assets (or elements of assets) are viewable to different people.

Billing Profile

A Billing Profile links actions that cost money on TDR to the billing account of its corresponding cloud service (ie., Google, Azure). Not all actions on TDR cost money, but creating a Billing Profile is required before conducting any billable actions on TDR; this includes adding datasets, ingesting data, or creating snapshots.

Custodian (role)

A type of role defined on a dataset, describing a user who is responsible for creating data snapshots over datasets, as well as controlling access to those snapshots.

Dataset

As the name implies, a dataset is a set of data. Datasets can contain records relating to any type of samples, metrics, or directories. Datasets are typically represented in table format. A set of related data. Custodians can store and organize almost any existing dataset in TDR.

Dataset schema

The organization of a dataset's primary data and metadata in interconnected tables (TSVs or CSVs).

Discoverer (role)

Someone using TDR to find data snapshots for analysis. Discoverers cannot read snapshot data unless they are given the Reader role.

Ingest

Data ingestion refers to any method of collecting data, storing data, or formatting data for analysis.

Owner (role)

The creator of a spend profile, a billing account used to fund data storage and querying. They can update, delete, or share this profile with other users.

Reader (role)

The Reader role is defined on a snapshot. Readers may be assigned by a dataset Custodian to get read access to the snapshot data.

Snapshot

A slice of a single dataset or a view of all or part of one or more studies. A Snapshot could be the data for samples funded by one organization or the subset of individuals matching specific criteria not common to everyone in their cohort. The Snapshot is the element that most users will interact with (i.e., the element that most researchers will analyze).

snapshot.jpg

Image: The distinction between a Dataset, and Asset, and a Snapshot can be a little difficult to grasp. To better illustrate the distinction, think of a Dataset as a collection of data present in TDR, while an Asset is simply a subset of that Dataset. There can be many different subsets of the same Dataset, each referencing different rows/columns/cells that exist or are updated within the source Dataset. In contrast, a Snapshot is a frozen-in-time subset of a Dataset or Asset that is intended to be shared with end users. Think of Assets as individual elements of which comprise a full Dataset, whereas a Snapshot is a "version" or "release" of a certain pre-determined part of the full Dataset as it existed at a certain point in time.

Snapshot creator (role)

A type of user who can read dataset data and create new Snapshots.

Steward (role)

A Steward (sometimes called the Data Owner) is a type of role defined on a dataset, defined as the person who created a dataset. While ultimately liable for the data, they can assign the hands-on data management to another person by assigning the Custodian role.

User (role)

A dataset user with the "User" role may link their Billing Profile to a dataset or snapshot that they create. They can also assign this role to other individuals.

 

 

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.