This article defines what Data Repository Service Uniform Resource Identifiers (DRS URIs) are and why and where they are used in Terra.
Using the GA4GH standard Data Repository Service (DRS) means you can access, combine, and analyze data no matter where it's stored. For example, Terra uses DRS URIs when (meta)data is "handed off" from an external data portal to Terra for analysis.
What are DRS URIs?
Different syntax for identifying data stored on different cloud-based infrastructures makes it challenging to combine data across cloud infrastructures effectively. The Data Repository Service (DRS) API is a standardized set of access methods used by data repositories to allow access to data in a single, standard way. Developed by the Global Alliance for Genomics and Health (GA4GH), DRS URIs enable researchers to access data regardless of underlying cloud infrastructure (i.e. Google Cloud, Azure, AWS, etc.).
A unique ID mapping that allows for flexible retrieval
The unique mapping is the DRS Uniform Resource Identifier (URI) - a string of characters (similar to URLs) that identifies a particular cloud-based resource and is agnostic to the cloud infrastructure where it physically exists.
DRS URIs allow easy access to data on any cloud-based storage system. With DRS URIs, the ocean of data files becomes an organized file cabinet with easy, reliable interoperability between data producers and data consumers, consistent with the FAIR data principles (Findable, Accessible, Interoperable, Reusable).
Where are DRS URIs in Terra?
Any links that reference where data in the cloud are physically located can be in DRS URI format: in data tables (in the Data page), as workflow input parameters (direct links or in a data table), or as data in an interactive analysis (in a Jupyter notebook, Galaxy, or RStudio).
Google bucket URL versus DRS URI
Data in a Google bucket is identified by a string of the format:
The same file might have a DRS URI that looks like this:
Format of DRS URIs in Terra
DRS URIs in Terra can have different formats, described below.
1. DRS URIs with hostname and data identifier
Consistent with the DRS standard, Terra supports DRS identifiers that include only the "drs" scheme (i.e. drs://DRS_hostname/data_identifier).
This is a compact format that omits the standard "boilerplate" elements of the standard endpoint path.
2. DRS URIs with a Data GUID Namespace
Data Globally Unique Identifiers (GUIDs) provide independence from a specific hostname by using a namespace instead.
3. Full standard DRS URLs
Terra is currently in transition from supporting the GA4GH Data Object Service (a precursor to the now-standard GA4GH DRS) to the standard DRS API. Using the full DOS/DRS URIs is not recommended or supported until this transition is complete.
DRS URIs in Terra workspace data tables
Note that workflows that use data tables for input will access and process the data without intervention including data identified with a DRS URI.
DRS URI in a data table (example)
Clicking on a DRS URI link in a data table will open the File Details dialog, which provides additional information about the file and options for downloading the file.
Downloading data from Requester Pays buckets
The File Details dialog does not currently support downloading files from requester pays buckets. In some additional cases, downloading the file from the File Details dialog is not supported. For example, the File Details dialog download support does not work with some external authentication and authorization services.
For instructions of how to copy data in a requester pays bucket with a DRS URI ID, see Accessing DRS URIs data files.
To learn more about how to organize and access data in the cloud using data tables, see Managing data with tables.
Troubleshooting DRS URI access in Terra
If the data referenced by a DRS URI is access-controlled (i.e. not public), access requires successful authentication and authorization. If your workflow fails immediately, it's usually because the WDL cannot access the workflow. This is often due to either an expired authorization link or an error configuring the workflow (i.e. a typo in the attribute name on the configuration form).
1. Make sure your Terra and NIH accounts are linked
To access data provided by external services, you must have an up-to-date link to that service in your Terra user Profile.
To learn more about linking to external services (including step-by-step instructions), see Linking authorization/accessing controlled data on external servers.
2. Verify with the DRS data provider that their DRS service is available and functioning properly.
Additional DRS URIs Resources
For more information about the GA4GH Data Repository Service (DRS)-specific tools in Terra:
- Access to data identified by DRS URIs is provided by a DRS client library (terra-notebook-utils package*)
- API to use with Notebooks
- CLI to use from the Terra terminal
- terra-notebooks-utils README
* This package allows you to perform lots of helpful operations, such as
- View details about the data
- Copy/download the data to the Cloud Environment VM or to a Google bucket
DRS in the news
Current DRS Documentation
DRS Repository on GitHub