Learn how to access to resources outside the Terra platform UI - using data stored in external buckets, running Google Cloud Platform VMs or GCP machine-learning tools. The article outlines how to harness Terra's back-end infrastructure while keeping it easy to manage (i.e. not using strings of random variables to represent users).
The Terra platform is designed to remove some of the barriers of moving to the cloud: Terra interfaces directly with Google so you don't have to. When interacting with Google Cloud Platform, Terra uses a special kind of Google account, called a service account that lets your workflows and notebooks access your data in Google Cloud. Every Terra user has one or more of these "pet" service accounts (one for each of their Billing Projects), which are used to access data, including external Google buckets, as well as other Google Cloud Platform resources (VMs that power Cloud Environments and workflows).
Service accounts are used when:
- Accessing a non-Terra GCS bucket, BQ dataset, GCR docker image, etc. (note that the resource owner must grant access)
- Running workflows or notebooks (interactive analyses) on virtual machines (VMs)
- Using external open-source resources (such as the open-source TensorFlow machine learning libraries) to analyze data in a workspace bucket
- Simplify using Terra programmatically, as it's better to authenticate programmatically with a service account than your full user account
In all of these instances, Terra assumes the identity of the service account - rather than your user ID credentials - to call Google APIs.
Best practices - Step-by-step
The service accounts that Terra uses behind the scenes to interface with GCP have the format
PROXY_<long-number>@firecloud.org. Although any user can use these pre-defined groups to grant access to a non-Terra GCS bucket, BQ dataset, GCR docker image, etc., the long string of numbers makes them not recommended (imagine you're a resource owner trying to identify who has access to the data in your external bucket. It's tough when the list is a bunch of
Best practices is to create a Terra user group that includes just yourself (but has a human-friendly name) and use it for interfacing with GCP (i.e. granting access to external buckets).
1. Set up a human-readable Terra managed group
- Adding to billing projects
Outside of Terra
- Granting access to a non-Terra GCS bucket, BQ dataset, GCR docker image, etc.
To learn more about Terra groups, see this article.
Always use Terra groups for accessing external resources, even for one user! With a Terra group, you can manage your Terra group within the Terra UI and Terra handles all the non-human-friendly back-end.
User ID: email@example.com
Create your personal Terra group in four steps
- Go to your Groups page ("Main menu" --> "Groups" from the top left of any page in Terra)
- In the "Create a New Group" card, click on the blue "+" icon
- Enter your human-friendly user-ID (can be the same as your Terra login) and click the "Create Group" button
- You can now use your mirrored Terra group for accessing external resources (i.e. firstname.lastname@example.org)
2. Grant permissions to the Terra Group
In order to grant access for a resource (such as a storage bucket or BigQuery dataset)
If what you see on the console does not look like the screenshots,
- From the GCP console select the resource to be shared (i.e. a particular bucket in https://console.cloud.google.com/storage/browser)
- Go to Permissions
- View by "Members" and select the "Add" icon
- Add the full name of your Terra group (i.e. email@example.com) as a New Member and select 1. the resource type (left column - i.e. "Cloud Storage") and 2. the appropriate roles.
- "Storage Object Viewer" if you want to read from the bucket
- "Storage Object Creator" if you want to write to the bucket
You will see your Terra group and role in the Members Permissions