This article explains what to expect in Terra when interacting with data in a "Requester Pays" bucket. Requester Pays is an optional Google setting for data stored on Google Cloud. It allows dataset owners to make data accessible to everyone without having to pay egress fees when someone reads or copies the data from a different region.
If you are interested in having Requester Pays enabled or disabled on your workspace bucket, please write to firstname.lastname@example.org.
Details to include
- The Google Project ID for the workspace (found under Workspace Information on the right side of the Dashboard)
- The workspace bucket name (found in the Cloud Information dropdown on the right side of the Dashboard, name like fc-random-character-string)
- Whether you want Requester Pays enabled or disabled.
You cannot run compute in a requester pays workspaceHowever, you can clone the workspace that it's enabled on and run compute in the clone. Workspaces with Requester Pays enabled are typically read-only workspaces.
What is a Requester Pays bucket/workspace?
In general, accessing data in a bucket in Google Cloud will generate an associated charge.
Actions that have a cost include
- Performing a request
- Reading data
- Retrieving data
- Storing data
In the default setting (without Requester Pays enabled), these costs are charged to the Google Cloud Billing account used to create the bucket (the owner of the data, in other words). However, in some cases, data custodians will want to host data without being on the hook for every costly action taken in the bucket they are publicly hosting.
When Requester Pays is enabled, the host of the data only pays for data storage. Other costs are charged to the user accessing the data.
How to access Requester Pays data
When you try to access data in a bucket or workspace that has Requester Pays enabled, Terra will alert you with one of the three following pop-up windows. The alert you see depends on which options are available to you.
Option 1. Using a workspace Google project
If you have a workspace to which you can charge any costs you incur, you will see a popup with a drop-down menu allowing you to select a workspace. Costs will be charged to the workspace's Terra Billing project (via the associated Google project).
Option 2. Using a new workspace
If you do not have owner or project owner permissions on a workspace, you will be directed to your workspace page, where you can create a workspace (you will be owner) or select a shared workspace for which you can request owner permission. Then, new workspace will appear in the dropdown and you can charge Requester Pays fees to it.
Option 3. Using gsutil
If you can provide a project to bill, you can run a gsutil cp command on an object in a requester pays bucket from a notebook, in a workflow, or on the command line.
gsutil -u <project_to_bill> cp <gs://path/to/file> <destination>
The <project_to_bill> is the project ID of your workspace Google project (not the one you are copying from). You can find it in the right column under Cloud Information on the workspace Dashboard.
A note of caution: If you are mistaken about the bucket in question being a "Requester Pays" bucket and use this command, you may inadvertently charge the bucket owner for the egress.
Charges for which you are responsible
The following operations within a Requester Pays workspace generate charges for which you, the requester, are responsible:
- Listing files in the "Files" section of the Data tab
- Previewing files in the Data tab (either from the "Files" section or an entity table)
- Downloading files in the Data tab (either from the "Files" section or an entity table)
- Listing notebooks in the Analyses tab
- Getting a read-only preview of a notebook
- Launching a notebook for editing (if you have can-compute permission on the workspace)
- Copying a notebook to another workspace
- Copying notebooks when cloning a workspace
What happens when you leave a Requester Pays workspace?
Note: When you leave a Requester Pays workspace, the workspace selection you made will be cleared, and the next time you enter that workspace, you will be prompted to select a workspace again.
This covers clicking around in the workspace. What about accessing these files from a (non-GATK) workflow?
If you are accessing data from a requester pays bucket using a workflow in Terra, Cromwell (the workflow management system) will automatically bill your billing project for the access to the resources. You can read a little more about this here: https://cromwell.readthedocs.io/en/stable/filesystems/GoogleCloudStorage/#requester-pays
Is it possible to estimate the costs associated with using data in a requester pays bucket before granting the bucket access to the billing project?
Please sign in to leave a comment.