How to access data with DRS URIs

Allie Cliffe
  • Updated

Learn how to bring data with a Data Repository Services API Uniform Resource Identifier (DRS URI) into your workspace storage (Google bucket) or Cloud Environment persistent disk to use in an interactive analysis (Jupyter notebook, Galaxy, or RStudio).

Overview

Reasons to copy DRS URI data files

  • To run an interactive analysis (Jupyter notebook, Galaxy, or RStudio) on data with a DRS URI, you first need to pull the data into your workspace Cloud Environment persistent disk.
  • To copy primary data into workspace storage.

Why additional commands are needed

Because these files do not have a gs://<file location><file-name> URL, you cannot use gsutil to perform these functions. DRS URI-specific commands (including copy) are provided by a DRS client library, terra-notebook-utils. The package includes an API to use with Jupyter notebooks and a command line interface (CLI) to use from the Terra terminal.

What is in the terra-notebook-utils package?

This package includes commands for viewing details about the data, copying/downloading the data to the Cloud Environment VM or Google bucket, and other helpful operations.

Instructions for viewing and copying/downloading data is in the following sections. For additional helpful information, see the terra-notebook-utils README.

Use a current version of terra-notebook-utilsBecause the Terra Cloud Environment is constantly updated, it is very important to use a current version ofterra-notebook-utils!

Please use terra-notebook-utils version 0.9.0 or later. Read on for how to install/update terra-notebook-utils.

How to use DRS URIs in a notebook

The terra-notebook-utils Python API is available in Python notebooks and scripts, and is callable from R notebooks and scripts.

Specifying the destination location with workspace environment variablesWhen running in notebooks, the current workspace namespace (Terra Billing project) and workspace name are used by default. These are used to specify the destination (i.e. the persistent disk) where the the data will be copied to.  

Step-by-step instructions

1. Install the latest version of  terra-notebook-utils.
In a Python Notebook, run the following.

%pip install --upgrade --no-cache-dir terra-notebook-utils

2. Import the terra-notebook-utilsdrs module.
Note that the table module is optional, yet useful and recommended.

from terra_notebook_utils import drs, table

3. (Optional) View details about the data identified by the DRS URI.

drs.info("drs://my-drs-uri")

4. Copy/download the data.
There are several options available depending on your use-case (i.e. whether to copy to the Cloud Environment VM or a Google bucket and whether to copy a single DRS URI or a list of DRS URIs).

  • To copy a single DRS URI file to the Cloud Environment VM
    drs.copy("drs://my-drs-url", "local_filepath")

    To copy a single DRS URI file to a Google bucket
    drs.copy("drs://my-drs-url", "gs://my-dst-bucket/my-dst-key")

    To copy a list of DRS URIs to the Cloud Environment VM
    drs.copy_batch(["drs://my-drs-url1", "drs://my-drs-url2"], "local_directory")

    To copy a list of DRS URIs to a Google bucket
    drs.copy_batch(["drs://my-drs-url1", "drs://my-drs-url2"],
    "gs://my-dst-bucket/prefix")
  • The terra-notebook-utils package also provides a useful function for finding the DRS URI for a given filename within the workspace. This requires the `table` module to be imported.

    To fetch a DRS URI from a Terra data table for a given file name, use:
    drs_url = table.fetch_drs_url("data table name", "file name")

How to use DRS URIs in the terminal

The terra-notebook-utils Python CLI is available for use from the Terra Terminal and within shell scripts. 

For instructions on how to access the workspace terminal, see Using the terminal and interactive analysis shell in Terra.

Step-by-step instructions

1. Install the latest version of terra-notebook-utils.
In the Terra Terminal, run

pip install --upgrade --no-cache-dir terra-notebook-utils

2. (Recommended) Set the terra-notebook-utils configuration.
This configuration applies only to the use of the terra-notebook-utils CLI, not to the API.

tnu config set-workspace my-workspace-name
tnu config set-workspace-namespace my-billing-project

Setting the workspace environment variablesWhen running in the terminal, the workspace namespace  (Terra Billing project) and workspace name must be provided either by 1) setting a terra-notebook-utils configuration (recommended), or 2) using environment variables, or 3) providing command-line options for each.

3. To view the current configuration, run the following.

tnu config print

4. (Optional) View details of the data identified by the DRS URI.

tnu drs info "drs://my-drs-uri"

5. Copy/download the data.
There are several options available depending on your use-case (i.e. whether to copy to the Cloud Environment VM or a Google bucket and whether copy a single DRS URI or a list of DRS URIs).

  • To copy a single DRS URI file to the Cloud Environment VM
    tnu drs copy drs://my-drs-url local_filepath

    To copy a single DRS URI file to a Google bucket
    tnu drs copy drs://my-drs-url gs://my-dst-bucket/my-dstkey

    To copy multiple DRS URIs to the Cloud Environment VM
    tnu drs copy-batch drs://my-drs-url1 drs://my-drs-url2 --dst local_directory

    To copy multiple DRS URIs to a Google bucket
    tnu drs copy-batch drs://my-drs-url1 drs://my-drs-url2 --dst
    gs://my-dst-bucket/prefix

  • Use the following command:
    tnu drs copy-batch drs://my-drs-url1 drs://my-drs-url2 --dst
    gs://my-dst-bucket/prefix

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.