Accessing data from an external bucket

Anton Kovalsky
  • Updated

To launch an analysis on data stored in cloud storage buckets, your tools need to have 1) the data's location in the cloud (i.e. complete file path) as well as 2) permission to access the storage bucket. This article covers how to make sure you have the right permission to access data located in an external bucket (i.e. a GCP bucket that is not a workspace bucket). 

To learn best practices for accessing external resources (like GCP buckets, VMs or machine-learning tools external to Terra), see this article.

Overview

If your workflows or notebooks use data files from an external bucket as input, you must give your Terra service accounts permission to access the bucket. There are several options for granting permission to Terra:

- Grant permission to a Terra managed group
- Make external permissions match workspace permissions (advanced)
   - Newer workspaces (click for step-by-step instructions)
   - Older workspaces (click for step-by-step instructions)
- Grant permission to a Terra proxy (not recommended) 

The benefits of each use-case, and step-by-step instructions, are outlined below. 

icon-warning2.png

 
Before you start: Only resource owners or admins can grant access permission

  If what you see on the console does not look like the screenshots, or if you get an error message about permission, it is most likely because you are not the storage bucket or BigQuery dataset owner or admin. You will need to ask the resource owner or admin to grant permission to your Terra group. 

Grant permission to a Terra managed group

You should always use Terra groups for accessing external resources, even for one user! Why?

  • Terra groups include a user's service accounts as well as the user's account ID, an are more human-friendly than the service accounts themselves
  • You can manage your Terra group within the Terra UI and Terra handles all the non-human-friendly back-end

Below are instructions for 1) creating a group - including personal Terra group as well as a group of collaborators - and 2) granting permission for an external bucket to a Terra group. 

1. Create your personal Terra group in four steps 

G0_tip-icon.png


Example: Terra group for single user
(User ID:  j_doe@someplace.org)

 

- Create a Terra Group: j_doe_at_someplace_org
- Don't add anyone else to this group
- Make grants to j_doe_at_someplace_org@firecloud.org

  1. Go to your Groups page ("Main menu" --> "Groups" from the top left of any page in Terra)
    Create-Terra-Group_Step-1_Screen_shot.png

  2. In the "Create a New Group" card, click on the blue "+" icon 
    Create-Terra-group_Step-2_Screen_shot.png

  3. Enter your human-friendly user-ID (can be the same as your Terra login) and click the "Create Group" button
    Create-Terra-Group_Step-3_Scren_shot.png

  4. You can now use your mirrored Terra group for accessing external resources (i.e.  j_doe_at_someplace_org@firecloud.org)
    Create-Terra-Group_Step-4_Screen_shot.png

or

1. Create a Terra group of collaborators in four steps

G0_tip-icon.png


Example: Terra group for collaborators

 

- Create a Terra Group: my_lab_at_someplace_org
- Add the Terra IDs of all collaborators to this group
- Make grants to my_lab_at_someplace_org@firecloud.org

  1. Go to your Groups page ("Main menu" --> "Groups" from the top left of any page in Terra)
    Create-Terra-Group_Step-1_Screen_shot.png

  2. In the "Create a New Group" card, click on the blue "+" icon 
    Create-Terra-group_Step-2_Screen_shot.png

  3. Enter your collaborator group name and click the "Create Group" button
    Create-Terra-Collaborative-Group_Step-3_Scren_shot.png

  4. Add or remove members inside the Terra UI
    Create-Terra-Collaborative-Group_Step-3a_Scren_shot.png

  5. You can now use this group (i.e.  my_lab_at_someplace_org@firecloud.org) for granting permissions for external resources

then

2. Grant permission to  Terra Group

  1. From the GCP console select the resource to be shared (i.e. a particular bucket in https://console.cloud.google.com/storage/browser)
    Grant-access-to-external-resource_Step-1_Screen_shot.png

  2. Go to Permissions
    Grant-access-to-external-resources_Step-2_Screen_shot.png
  3. View by "Members" and select the "Add" icon
    Grant-access-to-external-resources_Step-3_Screen_shot.png

  4. Add the full name of your Terra group (i.e. j_doe_at_someplace_org@firecloud.org) as a New Member and select the resource type (left column - i.e. "Cloud Storage") and the appropriate roles.
    1. "Storage Object Viewer" if you want to read from the bucket
    2. "Storage Object Creator" if you want to write to the bucket
      Grant-access-to-external-resources_Step-4_Screen_shot.png

You will see your Terra group and role in the Members Permissions
Grant-access-to-external-resources_Step-5_Screen_shot.png

 

Make external bucket permissions match workspace permissions (advanced)

Each workspace has behind-the-scenes groups for Owners, Writers, and Readers.

These groups are automatically updated when a workspace is shared - for instance if a user is added to the workspace with Owner level permissions, they will be in the Owner group. You can take advantage of these predefined Terra managed groups to give permissions to external resources, such as an external private bucket.

The advantage of this option is the external bucket will mirror the workspace sharing: whenever the workspace is shared with a new user, that user also gets access to the external bucket. See step-by-step instructions below to set up those permissions. 

The steps to do this depends on whether this workspace is older or if it is newer.  If you follow the process for “Older Workspaces” below and it doesn’t work, try the “Newer Workspaces” guide.  

Newer workspaces (after mid-December 2018)

1. Find workspace permission groups (newer workspaces)

You will need to go to the Swagger page and run the following API call:
https://sam.dsde-prod.broadinstitute.org/#/Resources/listResourcePolicies

1.1 First, go to the top of that page and click the "Authorize" button:

Access-buckets-workspace-permissions_Authorize-Swagger_Screen_shot.png

1.2. Check all of the boxes under "Scopes" (1), and click Authorize (2):

Access-buckets-workspace-permissions_Available-authorizations_Screen_shot.png

 

1.3. Then scroll down to the GET resource policies API  and check the "Try it out" button 
Note that if you click refresh instead of scrolling to the right place, your your authorization will be lost!!

Access-buckets-with-workspace-permissions_Swagger-Try-it-out_Screen_shot.png

1.4. Fill out the resourceTypeName field with "workspace"

Access-bucket-with-workspace-permissions_ResourceType_Screen_shot.png

1.5. Fill out the resourceid field with the unique workspace ID
The workspace ID is the workspace bucket ID minus the "fc-” (e.g. fc-d5eb5311-1cba-4c8f-84c5-27de52d2efbf). 

The bucket name can be found in the workspace Dashboard tab on the right. You can use the clipboard icon to copy the bucket name (see screenshot below), and remove the “fc-” from the beginning to get the workspace ID.

Accessing-external-resources_Google-bucket-on-dashboard_Screen_shot.png

1.5. Click “Execute”

6. Get the email corresponding to the right workspace role

The response section lists the "emails" that Terra assigns to each workspace role. You want to get the “email” for the appropriate workspace role. This is what you will then grant authorization to in the next step.
Workspace Owner: policyName: “owner” - take the “email” value
Access-bucket-with-workspace-permissions_Policy-owner-email_Screen_shot.png

Workspace Writer: policyName: “writer”: - take the “email” value
Access-buckets-with-workspace-permissions_Policy-Writer-email_Screen_shot.png

Workspace Reader: policyName: “reader”: - take the “email” value
Access-buckets-with-workspace-permissions_Policy-Reader-email_Screen_shot.png

2. Grant permission to the workspace groups (newer workspaces)

2.1. From the GCP console select the resource to be shared (i.e. a particular bucket in https://console.cloud.google.com/storage/browser)

Grant-access-to-external-resource_Step-1_Screen_shot.png

2.2. Go to Permissions

Grant-access-to-external-resources_Step-2_Screen_shot.png

2.3. View by "Members" and select the "Add" icon

Grant-access-to-external-resources_Step-3_Screen_shot.png

2.4. Add the three workspace groups (i.e. workspace Owner, Writer and Reader from step 1 above) as New Members, selecting the resource type (left column - i.e. "Cloud Storage") and the appropriate roles.

G0_tip-icon.png


Selecting the right roles

  If you only want members of this workspace to be able to read from the bucket
  • Add the three groups as Storage Object Viewer to that external bucket

If you want readers to read from the bucket, and writers/owners to be able to read from the bucket, as well as write to the bucket

  • Add all three workspace groups as Storage Object Viewer to the external bucket (all members need to be able to read from the bucket)
  • Add the Owner and Writer groups as Storage Object Admin or Storage Object Creator

To find out which permission might be correct, please see the documentation here.

Grant-access-to-external-resources_Step-4_Workspace-roles_Screen_shot_copy.png

Storage object roles/permissions

Storage Object Viewer: Grants access to view objects and their metadata, excluding ACLs. Can also list the objects in a bucket.

Storage Object Admin:Grants full control over objects, including listing, creating, viewing, and deleting objects.

Storage Object Creator: Allows users to create objects. Does not give permission to view, delete, or replace objects.

You will see your workspace group and role in the Members Permissions
Grant-access-to-external-resources_Step-5_Mirror-workspace-roles_Screen_shot.png

 

Older workspaces (before mid December, 2018)

1. Find workspace permission groups (older workspaces)

G0_tip-icon.png


Workspace permission group formatting

  Owner:    GROUP_d5eb5311-1cba-4c8f-84c5-27de52d2efbf-OWNER@firecloud.org
Writer:     GROUP_d5eb5311-1cba-4c8f-84c5-27de52d2efbf-WRITER@firecloud.org
Reader:   GROUP_d5eb5311-1cba-4c8f-84c5-27de52d2efbf-READER@firecloud.org

1.1  Prepend GROUP_

1.2. Insert the unique workspace ID
The workspace ID is the workspace bucket ID minus the "fc-” (e.g. fc-d5eb5311-1cba-4c8f-84c5-27de52d2efbf). 

The bucket name can be found in the workspace Dashboard tab on the right. You can use the clipboard icon to copy the bucket name (see screenshot below), and remove the “fc-” from the beginning to get the workspace ID.

Accessing-external-resources_Google-bucket-on-dashboard_Screen_shot.png

1.3. Add the appropriate ending
    Owners group: add -OWNER@firecloud.org to the end  
    Writers group: add -WRITER@firecloud.org to the end
    Readers group: add -READER@firecloud.org to the end

Note that you will need all three workspace groups in the next step.

2. Grant permission to the workspace groups (older workspaces)

2.1. From the GCP console select the resource to be shared (i.e. a particular bucket in https://console.cloud.google.com/storage/browser)
Grant-access-to-external-resource_Step-1_Screen_shot.png

2.2. Go to Permissions
Grant-access-to-external-resources_Step-2_Screen_shot.png

2.3. View by "Members" and select the "Add" icon
Grant-access-to-external-resources_Step-3_Screen_shot.png

2.4. Add the three workspace groups (i.e. GROUP_<long string>-WRITER@firecloud.org) as New Members, selecting the resource type (left column - i.e. "Cloud Storage") and the appropriate roles.

G0_tip-icon.png


Selecting the right roles

  If you only want members of this workspace to be able to read from the bucket
  • Add the three groups as Storage Object Viewer to that external bucket

If you want readers to read from the bucket, and writers/owners to be able to read from the bucket, as well as write to the bucket

  • Add all three workspace groups as Storage Object Viewer to the external bucket (all members need to be able to read from the bucket)
  • Add the Owner and Writer groups as Storage Object Admin or Storage Object Creator

To find out which permission might be correct, please see the documentation here.

Grant-access-to-external-resources_Step-4_Workspace-roles_Screen_shot_copy.png

Storage object roles/permissions

Storage Object Viewer: Grants access to view objects and their metadata, excluding ACLs. Can also list the objects in a bucket.

Storage Object Admin:Grants full control over objects, including listing, creating, viewing, and deleting objects.

Storage Object Creator: Allows users to create objects. Does not give permission to view, delete, or replace objects.

 

If you get an error like this (screenshot below), it is likely a newer workspace. Please see the instructions below. 

Accessing-External-Buckets_Error-message-no-permissions_Screen_shot.png

You will see your workspace group and role in the Members Permissions
Grant-access-to-external-resources_Step-5_Workspace-roles_Screen_shot_copy.png

Grant permission to a Terra proxy (not recommended)

You can use your pre-defined proxy to grant the appropriate type of permission for your desired use. The downside of this approach is that the proxy (found in your profile page) is not human-friendly. It will be difficult to identify which Terra user is associated with the proxy when looking at the list of who has access permission for an external bucket. 

For this reason, we recommend using a personal Terra user group with a meaningful name instead of your proxy. 

1. Find your Proxy group

Go to "Profiles" from the main navigation menu (three lines at the top of any page on Terra).

Access-profile-page_Screen_shot.png

You'll see your proxy group listed near the bottom:

Advanced-GCP-features_Add-proxy-group-Step2-Find-proxy.png

2. Grant permission to your proxy group

To do this, you'll just need to add your proxy group email to the access list of your external bucket:

TPHPDc2aofOdVnJ697NHbA.png

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.