Accessing Advanced GCP features in Terra
FollowDo you want to perform Google Cloud Platform (GCP) operations not currently available in the Terra UI?
- WRITE to BigQuery
- Interact with Cloud Storage buckets other than the workspace bucket
- Run dsub jobs
- Run Cloud Dataflow jobs
- Run Cloud ML engine jobs
You can do many of these things in Terra already! This article explains how to leverage Terra notebooks and workflows to access additional Google Cloud Platform (GCP) features in Terra.
Contents
Getting Started with advanced GCP features
Set up a GCP Billing account
1. Set up a GCP-native project (on GCP console)
2. Create a human-friendly Terra group for external interfacing
3. Add your personal group to your GCP project
Step-by-step guides and template notebooks
- Create an external GCP bucket accessible from Terra workspaces
- Set bucket storage to auto-delete
- Interact with Cloud Storage buckets other than the workspace bucket (example notebook)
- Create and write to a BigQuery dataset
- How to WRITE to BigQuery (example notebook)
Other things you can do using a cloud-native project
- Run dsub jobs
- Run Cloud Dataflow jobs
- Run Cloud ML engine job
Getting Started with advanced GCP features
The Terra platform is designed to remove some of the barriers of moving to the cloud: Terra interfaces directly with Google so you don't have to. However, there are many GCP features that have not been included in the platform. Some are on the horizon, others are niche capabilities that may never be integrated in the Terra UI.
Just because they aren't integrated into the UI doesn't mean you cannot use them, however. You can access these advanced features through a GCP project, which you will set up on GCP console and connect to Terra with a human-friendly personal Terra group following the steps below. Once you follow these three setup steps, you'll be able to use the GCP project to leverage advanced GCP features by running notebooks and workflows on Terra.
|
|
---|---|
In order to set up a GCP-native project on GCP console, you need to be an owner or user on a GCP To learn how to set up GCP billing, and access $300 in free credits from Google, see this article. |
1. Set up a GCP-native project (on GCP console)
- From the main menu (three horizontal lines at the top left of the GCP console page) go to the "IAM & Admin" -> "Manage resources" page
- On the Select organization drop-down list at the top of the page, select the organization in which you want to create a project. Free trial users can skip this step, as this list does not appear.
- Select "Create project"
- In the New Project window that appears, enter a project name and select a billing account.
- If you don't see a Billing
account in the drop down, you
can set one up following these
instructions. - Note that a project name can
contain only letters, numbers,
single quotes, hyphens, spaces,
or exclamation points, and must
be between 4 and 30 characters.
- If you don't see a Billing
- Enter the parent organization or folder in the Location box. That resource will be the hierarchical parent of the new project.
- When you're finished entering new project details, click "Create"
2. Create a human-friendly personal Terra group in three steps
|
|
---|---|
Each Terra user has a pre-built "Proxy" Group for accessing However, your proxy group is not very human-friendly. If you're is not helpful unless you happen to have a way to figure out what Instead, you can create a Terra group (with a sensible name) as a |
- Go to your Groups page ("Main menu" --> "Groups" from the top left of any page in Terra)
- In the "Create a New Group" card, click on the blue "+" icon
- Enter your human-friendly user-ID (can be the same as your Terra login) and click the "Create Group" button
Terra creates a mirrored Google group (your Terra ID plas your built-in proxy) you can use for interfacing with external resources (i.e. Google Cloud Platform). It is this mirrored group that Terra will use when interfacing with GCP directly.
You'll see the full name in your list of Groups (below). In the next step, you'll grant permission for this group to access the cloud-native GCP you created in step 1:
3. Add your Terra group on the GCP project
You will give your personal Terra group Editor permission (for more information about GCP permissions, see this article).
Note that if your Terra group includes additional people, you will want to be careful what permissions you grant the group. This is because 'Editors' can turn on a large number of services, including ones that can be expensive!
- Go to "IAM" ->"Manage Resources" in your new GCP project and select “Add Member”
- Add your human-friendly personal Terra group as a Member in your project permissions
- Give the group Editor permissions
Once these three steps are complete, you'll be able to do many advanced GCP tasks. In many cases Terra will interface with GCP on your behalf! Read on for details of how to do specific tasks. We will continue to add to this list.
Step-by-step how-to instructions
Below are a series of features users have asked about that are not (yet!) available in Terra. Expand each section for step-by-step instructions - or a link to a notebook in the public workspace.
Create an external GCP bucket accessible by your Terra workspaces
- Go to GCP Storage Console
- Select your GCP-native project from the dropdown. Click "Create bucket"
External GCP bucket configuration tipsIn general, you can use the default values when setting up your external bucket.
For customization details, see the Google documentation.
When you are done, you will see your external bucket in the console!
Set an external GCP bucket to auto-delete
- Go to GCP Storage console at console.cloud.google.com/storage/browser
- Select the bucket you want to set to automatically delete data by clicking the bucket name
- Select the "Lifecycle" tab
- Choose "ADD A RULE"
- Follow the instructions to set up a custom rule
If you set up a rule to delete contents after 1 day, for example, you will see this
Interact with Cloud storage buckets other than workspace bucket (template notebook)
There are times when you may not want to keep shared data in a workspace bucket (particularly if you're sharing large numbers of large data files with a large group).
|
|
---|---|
To learn more about sharing large numbers of large data files with large groups, see this article. |
|
|
---|---|
For an end-to-end example of interacting with an external bucket, see this template notebook. |
Create a BigQuery dataset (in GCP console)
- Go to BigQuery in the GCP console and select the native GCP project you created above
- Select "Create Dataset"
- In the dataset creation form, choose a unique dataset name and select the Default table expiration.
In general, you would choose "Never". But if you are testing queries and saving those results as tables, you may generate a lot of tables that you don't want to keep (or pay for). To avoid having to clean up those tables at the end of the day, you can create a BigQuery dataset for test results that auto deletes its tables after a period of time has elapsed. - You will see your new Big Query dataset in the Resources section on the far left
How to load data to BigQuery (template notebook)
Note that before you can load data to BigQuery, you must have at least WRITE access permission to an existing BQ dataset. If you have set up your own BigQuery dataset (above), you will automatically have those permissions.
|
|
---|---|
See an Example notebook in a public Terra workspace |
Other things you can do in a GCP Project
|
|
---|---|
dsub |
Comments
0 comments
Please sign in to leave a comment.