Using Requester Pays workspaces/buckets

Anton Kovalsky

This article explains what to expect in Terra when interacting with data in a "Requester Pays" bucket. Requester Pays is an optional Google setting for data stored on Google Cloud. It allows dataset owners to make data broadly accessible without having to pay data transfer fees when someone reads or copies the data from a different region.

If you are interested in having Requester Pays enabled or disabled on your workspace bucket

Please write to support@terra.bio

Details to include

  • The Google Project ID for the workspace (found under Workspace Information on the right side of the Dashboard)
  • The workspace bucket name (found in the Cloud Information dropdown on the right side of the Dashboard, name like fc-random-character-string)
  • Whether you want Requester Pays enabled or disabled.

You cannot run compute in a requester pays workspaceHowever, you can clone the workspace that it's enabled on and run compute in the clone. Workspaces with Requester Pays enabled are typically read-only workspaces.

What is a Requester Pays bucket/workspace?

When accessing data in a bucket with Requester Pays enabled in Google Cloud, data transfer costs are paid by the requester (not the bucket owner).  

Operations with a data transfer cost (requesters pay)

  • Listing files in the "Files" section of the Data tab
  • Previewing files in the Data tab (either from the "Files" section or an entity table)
  • Downloading files in the Data tab (either from the "Files" section or an entity table)
  • Listing notebooks in the Analyses tab
  • Getting a read-only preview of a notebook
  • Launching a notebook for editing (if you have can-compute permission on the workspace)
  • Copying a notebook to another workspace
  • Copying notebooks when cloning a workspace

Non-Requester Pays buckets

In the default setting (without Requester Pays enabled), these costs are charged to the Google Cloud Billing account used to create the bucket (the owner of the data, in other words).

Why use Requester Pays buckets?

In some cases, data custodians will want to host data without being on the hook for every costly action taken in the bucket they are publicly hosting. Enabling Requester Pays means the host of the data only pays for data storage. Other (data transfer) costs are charged to the user accessing the data.

How to access Requester Pays data/resources in Terra

When you try to access data in a Requester Pays bucket or workspace, Terra will alert you with the following pop-up window.

Download-data-from-requester-pays-bucket.png

Copy file with gcloud storage in a terminal

If you can provide a Google project to bill, you can run a gcloud storage cp command on an object in a Requester Pays bucket from a notebook, in a workflow, or on the command line. 

gcloud storage --billing-project=<project_to_bill> cp <gs://path-to-file-to-download> <gs://destination>

The <project_to_bill> is the Google project ID to bill. This can be a workspace Google project (not the one you are copying from) or an external Google project created on GCP. You can find each workspace's Google project in the right column under Cloud Information on the Dashboard.

Screenshot of workspace dashboard with the cloud information section on the right expanded to show the highlighted Google project ID

Be careful when using this command! If you are mistaken about the bucket in question being a Requester Pays bucket and use this command, you may inadvertently charge the bucket owner for the data transfer.

What happens when you leave a Requester Pays workspace?

When you leave a Requester Pays workspace, the workspace selection you made will be cleared, and the next time you enter that workspace, you will be prompted to select a workspace to bill to again.

 

Was this article helpful?

Comments

4 comments

  • Comment author
    Joe Brown

    This covers clicking around in the workspace. What about accessing these files from a (non-GATK) workflow?

    0
  • Comment author
    Jason Cerrato

    Hi Joe,

    If you are accessing data from a requester pays bucket using a workflow in Terra, Cromwell (the workflow management system) will automatically bill your billing project for the access to the resources. You can read a little more about this here: https://cromwell.readthedocs.io/en/stable/filesystems/GoogleCloudStorage/#requester-pays

    Kind regards,

    Jason

    0
  • Comment author
    Stephanie Hoyt

    Is it possible to estimate the costs associated with using data in a requester pays bucket before granting the bucket access to the billing project?

    0
  • Comment author
    Allie Cliffe

    Stephanie Hoyt The cost of using requester pays data is for egress (moving the file between storage regions or locations). If you click on a link to the data file from within Terra, you should get a popup that gives the egress costs for that file.

    0

Please sign in to leave a comment.