Accessing data from an external bucket

Anton Kovalsky
  • Updated

To analyze data stored in cloud storage buckets not associated with a Terra workspace, your tools need 1) the data's location in the cloud (a complete file path, or URI) and 2) permission to access the storage bucket. Read this article to learn how to give your tools permission to access data located in an external bucket (i.e., a Google Cloud bucket that is not a workspace bucket). 

To learn more about accessing external resources (such as Google Cloud buckets, or Virtual Machines (VMs) or machine-learning tools external to Terra), see Best practices for accessing external resources.

Best practices overview

If your workflows or notebooks use data files from an external bucket as input, you must give your Terra service accounts permission to access that bucket. There are two ways to do this: (1) give permission to a Terra-managed group or (2) use workspace permissions. Benefits and step-by-step instructions for both options are outlined below. 

Before you start: Only resource owners or admins can grant access permission If what you see on the console does not look like the screenshots below, or if you get an error message about permission, it is most likely because you are not the storage bucket or BigQuery dataset owner or admin. You need to ask the resource owner or admin to grant permission to your Terra group, following the directions below. 

Option 1: Grant permission to a Terra-managed group

You should always use Terra groups for accessing external resources, even for one user!

Why use a managed group

  • Terra groups are more human-friendly than service accounts. They include a user's service accounts as well as the user's account ID but are much easier to identify and troubleshoot than random strings of numbers and letters.
  • You can manage your Terra group within the Terra UI, and Terra handles all the nonhuman-friendly back end.

Step 1. Create your Terra group in four steps 

Terra-managed groups can be for a single individual or a true group. For more information on why to use a managed group for one person, see Best practices for accessing external resources.

  • 1. Go to your Groups page (Main menu > Groups from the top left of any page in Terra)
    Screenshot of steps to find your Groups page from any page on Terra with arrows pointing to the main menu action item in the top left (1) and the groups submenu below the profile section (2)


    2. Click the blue Create a New Group button 
    Screenshot of the Groups page, showing a blue button labeled 'Create a New Group'.

    3. Enter your human-friendly user-ID in the field (can be the same as your Terra login) and click the Create Group button.
    Screenshot of the screen used to create a new Terra group.

    You should see the new group displayed on the Groups page, along with its id (for example, A_researcher@firecloud.org. Use this group id to give yourself access to external resources, such as data in non-Terra Google buckets.
    Screenshot showing the new Terra group on the Groups page.

  • 1. Go to your Groups page (Main menu > Groups from the top left of any page in Terra)
    Screenshot of steps to find your Groups page from any page on Terra with arrows pointing to the main menu action item in the top left (1) and the groups submenu below the profile section (2)
    2. Click the blue Create a New Group button 
    Screenshot of the Groups page, showing a blue button labeled 'Create a New Group'.

    3. Enter your collaborator group name in the field and click the Create Group button
    name_collaborator_group.png

    You should see the new group displayed on the Groups page, along with its id (for example, A_lab@firecloud.org. Use this group id to give yourself access to external resources, such as data in non-Terra Google buckets.

    Screenshot showing a new Terra group on the Groups page.
    4. Click on the group and then click Add user to add users. To remove users, click on the group and then click on the circular icon with three dots next to the user's name and select remove user

Step 2. Grant permission to the Group

2.1. From the Google Cloud console, select the resource to be shared (i.e., a particular bucket in https://console.cloud.google.com/storage/browser).

Screenshot of storage browswer on GCP console with an example bucket called ac-storage-bucket highlighted

2.2. Go to Permissions.
Screenshot of the details page on the GCP console for an example bucket called ac-external-bucket. The permissions tab is highlighted

2.3. View by Principals and select the Add icon.
Screenshot of the permissions page on the GCP console for an example bucket called ac-external-bucket. The 'view by principals' tab is highlighted and an arrow is pointing to the add button

2.4. Add the your Terra group's email id (i.e., j_doe_at_someplace_org@firecloud.org) as a New Member and select the resource type (left column - i.e., Cloud Storage) and the appropriate roles.

Most common roles

  1. Storage Object Viewer to read from the bucket.
  2. Storage Object Creator to write to the bucket.

Screenshot of the 'Add principals' screen and roles options for an example bucket called 'ac-external-bucket'. A Terra group id, 'j_does_at_someplace_org@firecloud.org' has been added to the new principals field and 'Cloud storage' + 'Storage object creator' +'Storage object viewer' are highlighted in the 'select a role' field

What to expect

You will see your Terra group and role in the Member's Permissions.

Screenshot showing the principals with access to an example bucket on the Google Cloud Console.

Option 2: Sync external bucket permissions to workspace permissions (advanced)

Each workspace has behind-the-scenes groups for Owners, Writers, and Readers.

These groups are automatically updated when a workspace is shared. For example, if a user is added to the workspace with Owner level permissions, they will be in the Owner group. You can take advantage of these predefined Terra managed groups to give permissions to external resources, such as an external private bucket.

Using this option, the external bucket mirrors the workspace sharing: whenever the workspace is shared with a new user, that user also gets access to the external bucket. See step-by-step instructions below to set up those permissions. 

Step 1. Find workspace permission groups

1.1. Go to the Swagger endpoint for the listResourcePoliciesV2 API: https://sam.dsde-prod.broadinstitute.org/#/Resources/listResourcePoliciesV2.

1.2. First, go to the top of that page and click the Authorize button.

Screenshot of the Swagger page with an arrow pointing to the green 'Authorise Swagger' button at the top right

1.3. Click Authorize in the window that appears. You will be redirected to a Terra login screen, where you can choose to log in with a Google or Microsoft id. Log in using whichever credentials you use to log into Terra. 

alt

1.4. Then scroll down to the listResourcePoliciesV2 endpoint and click the Try it out button. 

Screenshot of the listResourcePoliciesV2 Swagger endpoint with the 'Try it out' button highlighted in orange.

Don't refresh the page, or your authorization will be lost!!

1.5. Type "workspace" into the resourceTypeName field.Screenshot of the parameters part of the Swagger page with 'workspace' in the 'Type of resource' field and the Google bucket d5eb5311-1cba-4c8f-84c5-27de52d2efbf in the 'id of resource' field

1.6. Copy your workspace's ID in the resourceId field.
The workspace ID is the workspace's bucket name minus the fc-. For example, if your workspace's bucket name is fc-d5eb5311-1cba-4c8f-84c5-27de52d2efbf, your workspace ID is d5eb5311-1cba-4c8f-84c5-27de52d2efbf

The bucket name can be found in the Cloud Information section on the right-hand side of the workspace's Dashboard. Click on the clipboard icon next to the bucket name to copy it. Remove the fc- from the beginning when you paste it into the resourceId field of the listResourcePoliciesV2 endpoint. 

Screenshot of the Cloud Information section from an example workspace's Dashboard. Orange boxes highlight the workspace's bucket name and the clipboard icon used to copy the bucket name.

1.7. Click Execute.

Once the job has finished running, you will see a response body listing the policies (access and computing permissions) for the workspace.

1.8. Within the response body, find the email corresponding to the each workspace role.

The response body section lists the "emails" that Terra assigns to each workspace role: workspace owner, workspace writer, and workspace reader. Find and copy these emails -- you will add them to your external Google bucket in the next step.
Workspace Owner: policyName: “owner” - take the “email” value
Screenshot of code 200 response body with email 'policy-a65f75a7-a65a-e6a6832a452@firecloud.org' and 'roles [owner]' circled

Workspace Writer: policyName: “writer”: - take the “email” value
Screenshot of code 200 response body with email 'policy-a65f75a7-a65a-e6a6832a452@firecloud.org' and 'policyName writer' circled

Workspace Reader: policyName: “reader”: - take the “email” value
Screenshot of code 200 response body with email 'policy-a65f75a7-a65a-e6a6832a452@firecloud.org' and 'roles [reader]' circled

Step 2. Grant permission to workspace groups

2.1. From the Google Cloud console, select the resource to be shared (i.e., a particular bucket in https://console.cloud.google.com/storage/browser).Screenshot of storage browswer in GCP console with 'external-bucket' selected and circled for emphasis

2.2. Go to Permissions.Screenshot of Storage bucket details for 'external-bucket' on GCP with the Permissions tab

2.3. View by Members and select the Add icon.Screenshot of Storage bucket permissions details for 'external-bucket' on GCP with the members tab circled and an orange arrow pointing to the 'Add' botton and icon

2.4. Add the three workspace groups (i.e., workspace Owner, Writer and Reader from part 1.8 of Step 1 above) as New Members, selecting the resource type (left column - i.e., "Cloud Storage") and the appropriate roles.

Selecting the right roles If you only want members of this workspace to be able to read from the bucket:
Add the three groups as Storage Object Viewer to the external bucket. 

If you want readers to read from the bucket, and writers/owners to read from the bucket, as well as write to the bucket: 
1) Add all three workspace groups as Storage Object Viewer to the external bucket (if all members need to read from the bucket)
2) Add the Owner and Writer groups as Storage Object Admin or Storage Object Creator.

To find out which permission might be correct, please see Google's documentation on permissions.

Screenshot of the 'Add members' popup with 'GROUP_d5eb5311-1cba-4c8f-84c5-27de52d2efbf-OWNER@firecloud.org in the 'New members' field and 'Cloud storage' + 'Storage object creator' cirlced in the 'Select a role' dropdown.

Storage object roles/permissions

Storage Object Viewer: Grants access to view objects and their metadata, excluding ACLs. Can also list the objects in a bucket.

Storage Object Admin: Grants full control over objects, including listing, creating, viewing, and deleting objects.

Storage Object Creator: Allows users to create objects. Does not give permission to view, delete, or replace objects.

What to expect

Once you've given one of the workspace roles permission to access the bucket, you will see the role's email and role in the Members Permissions table on the Google Cloud Console.
Screenshot of Members permissions in GCP console with 'policy-d5eb5311-1cba-4c8f-84c5-27de52d2efbf-OWNER@firecloud.org circled under the 'Member' column

Was this article helpful?

1 out of 2 found this helpful

Comments

0 comments

Please sign in to leave a comment.