This article defines what Data Repository Service Uniform Resource Identifiers (DRS URIs) are and why and where they are used in Terra.
Using the GA4GH standard Data Repository Service (DRS) means you can access, combine, and analyze data no matter where it's stored. For example, Terra uses DRS URIs when (meta)data is "handed off" from an external data portal to Terra for analysis.
What are DRS URIs?
Different syntax for identifying data stored on different cloud-based infrastructures makes it challenging to combine data across cloud infrastructures effectively. The Data Repository Service (DRS) API is a standardized set of access methods used by data repositories to allow access to data in a single, standard way. Developed by the Global Alliance for Genomics and Health (GA4GH), DRS URIs enable researchers to access data regardless of underlying cloud infrastructure (i.e. Google Cloud, Azure, AWS, etc.).
A unique ID mapping that allows for flexible retrieval
The unique mapping is the DRS Uniform Resource Identifier (URI) - a string of characters (similar to URLs) that identifies a particular cloud-based resource and is agnostic to the cloud infrastructure where it physically exists.
DRS URIs allow easy access to data on any cloud-based storage system. With DRS URIs, the ocean of data files becomes an organized file cabinet with easy, reliable interoperability between data producers and data consumers, consistent with the FAIR data principles (Findable, Accessible, Interoperable, Reusable).
Where are DRS URIs in Terra?
Any links that reference where data in the cloud are physically located can be in DRS URI format: in data tables (in the Data page), as workflow input parameters (direct links or in a data table), or as data in an interactive analysis (in a Jupyter notebook, Galaxy, or RStudio).
Google bucket URL versus DRS URI
Data in a Google bucket is identified by a string of the format:
The same file might have a DRS URI that looks like this:
Format of DRS URIs in Terra
DRS URIs in Terra can have different formats, described below.
1. DRS URIs with hostname and data identifier
Consistent with the DRS standard, Terra supports DRS identifiers that include only the "drs" scheme (i.e. drs://DRS_hostname/data_identifier).
This is a compact format that omits the standard "boilerplate" elements of the standard endpoint path.
2. DRS URIs with a Data GUID Namespace
Data Globally Unique Identifiers (GUIDs) provide independence from a specific hostname by using a namespace instead.
3. Full standard DRS URLs
Terra is currently in transition from supporting the GA4GH Data Object Service (a precursor to the now-standard GA4GH DRS) to the standard DRS API. Using the full DOS/DRS URIs is not recommended or supported until this transition is complete.
DRS URIs in Terra workspace data tables
Note that workflows that use data tables for input will access and process the data without intervention including data identified with a DRS URI.
DRS URI in a data table (example)
Clicking on a DRS URI link in a data table will open the File Details dialog, which provides additional information about the file and options for downloading the file.
Additional DRS URIs Resources
For more information about the GA4GH Data Repository Service (DRS)-specific tools in Terra:
- Access to data identified by DRS URIs is provided by a DRS client library (terra-notebook-utils package*)
- API to use with Notebooks
- CLI to use from the Terra terminal
- terra-notebooks-utils README
* This package allows you to perform lots of helpful operations, such as
- View details about the data
- Copy/download the data to the Cloud Environment VM or to a Google bucket
DRS in the news
Current DRS Documentation
DRS Repository on GitHub