Accessing data from an external bucket

Anton Kovalsky
  • Updated

To analyze data stored in cloud storage buckets not associated with a Terra workspace, your tools need 1) the data's location in the cloud (a complete file path, or URI) and 2) permission to access the storage bucket. Learn how to get the right permission to access data located in an external bucket (i.e., a Google Cloud bucket that is not a workspace bucket). 

To learn more about accessing external resources (such as Google Cloud buckets, or Virtual Machines (VMs) or machine-learning tools external to Terra), see Best practices for accessing external resources.

Best practices overview

If your workflows or notebooks use data files from an external bucket as input, you must give your Terra service accounts permission to access that bucket. Two ways to do this are to give permission to a Terra-managed group or to use workspace permissions. Benefits and step-by-step instructions for both options are outlined below. 

Before you start: Only resource owners or admins can grant access permission If what you see on the console does not look like the screenshots below, or if you get an error message about permission, it is most likely because you are not the storage bucket or BigQuery dataset owner or admin. You need to ask the resource owner or admin to grant permission to your Terra group, following the directions below. 

Option 1: Grant permission to a Terra-managed group

You should always use Terra groups for accessing external resources, even for one user!

Why use a managed group

  • Terra groups are more human-friendly than service accounts. They include a user's service accounts as well as the user's account ID but are much easier to identify and troubleshoot than random strings of numbers and letters.
  • You can manage your Terra group within the Terra UI, and Terra handles all the nonhuman-friendly back end.

Step-by-step instructions

  1. Create a group - including a personal Terra group as well as a group of collaborators
  2. Grant permission for an external bucket to a Terra-managed group 

Step 1. Create your Terra group in four steps 

Terra-managed groups can be for a single individual or a true group. For more information on why to use a managed group for one person, see Best practices for accessing external resources.

  • Example: Terra group for single user (User ID:  j_doe@someplace.org)
    - Create a Terra Group: j_doe_at_someplace_org
    - Don't add anyone else to this group
    - Make grants to j_doe_at_someplace_org@firecloud.org

    1. Go to your Groups page (Main menu > Groups from the top left of any page in Terra)
    Screenshot of steps to find your Groups page from any page on Terra with arrows pointing to the main menu action item in the top left (1) and the groups submenu below the profile section (2)


    2. In the Create a New Group card, click on the blue + icon 
    Screenshot of the Groups page with an arrow pointing to the card labeled create a group

    3. Enter your human-friendly user-ID in the field (can be the same as your Terra login) and click the Create Group button
    Screenshot of the Create a New Group popup with j_doe_at_someplace_org in the enter a unique name field

    Now, use your mirrored Terra group for accessing external resources (i.e.,  j_doe_at_someplace_org@firecloud.org)
    Create-Terra-Group_Step-4_Screen_shot.png

  • Example: Terra group for collaborators - Create a Terra Group: my_lab_at_someplace_org
    - Add the Terra IDs of all collaborators to this group
    - Make grants to my_lab_at_someplace_org@firecloud.org

    1. Go to your Groups page (Main menu > Groups from the top left of any page in Terra)
    Screenshot of steps to find your Groups page from any page on Terra with arrows pointing to the main menu action item in the top left (1) and the groups submenu below the profile section (2)

    2. In the Create a New Group card, click on the blue + icon 
    Screenshot of Groups page with an arrow pointing to the card labeled create a group

    3. Enter your collaborator group name in the field and click the Create Group button
    Screenshot of the Create a New Group popup with my_lab_at_someplace_org in the enter a unique name field

    4. Add or remove members in the Group management page in Terra.

    Now, use this group (i.e.,  my_lab_at_someplace_org@firecloud.org) for granting permissions for external resources.

Step 2. Grant permission to the Group

2.1. From the Google Cloud console, select the resource to be shared (i.e., a particular bucket in https://console.cloud.google.com/storage/browser).

Screenshot example

Screenshot of storage browswer on GCP console with ac-storage-bucket highlighted

2.2. Go to Permissions.
Screenshot of ac-external-bucket details page on GCP console with permissions tab highlighted

2.3. View by Principals and select the Add icon.
Screenshot of ac-external-bucket permisssions page on GCP console with view by principals tab highlighted and an arrow pointing to the add button

2.4. Add the full name of your Terra group (i.e., j_doe_at_someplace_org@firecloud.org) as a New Member and select the resource type (left column - i.e., Cloud Storage) and the appropriate roles.

Most common roles

  1. Storage Object Viewer to read from the bucket.
  2. Storage Object Creator to write to the bucket.

Example screenshot
Screenshot of Add principals and roles and roles for 'ac-external-bucket' resource page with 'j_does_at_someplace_org' in the new principals field and 'Cloud storage' + 'Storage object creator' +'Storage object viewer' highlighted in the 'select a role' field

What to expect

You will see your Terra group and role in the Member's Permissions.

Example screenshot
Access-external-bucket_See-Terra-group_Screen_shot.png

Option 2: Sync external bucket permissions to workspace permissions (advanced)

Each workspace has behind-the-scenes groups for Owners, Writers, and Readers.

These groups are automatically updated when a workspace is shared. For example, if a user is added to the workspace with Owner level permissions, they will be in the Owner group. You can take advantage of these predefined Terra managed groups to give permissions to external resources, such as an external private bucket.

Using this option, the external bucket mirrors the workspace sharing: whenever the workspace is shared with a new user, that user also gets access to the external bucket. See step-by-step instructions below to set up those permissions. 

The steps to do this depend on the age of the workspace.  If you follow the process for “Older Workspaces” below and it doesn’t work, try the “Newer Workspaces” guide.  

  • Step 1. Find workspace permission groups (newer workspaces)

    Go to the Swagger page and run the following API call:
    https://sam.dsde-prod.broadinstitute.org/#/Resources/listResourcePolicies

    1.1. First, go to the top of that page and click the Authorize button.

    Screenshot of the Swagger page with an arrow pointing to the green 'Authorise Swagger' button at the top right

    1.2. Check all of the boxes under Scopes (1), and click Authorize (2).Screenshot of the Available authorizations popup with and arrow and the number one pointing to all three scopes checked and an arrow and number two pointing to the green authorize button at the bottom right

     

    1.3. Then scroll down to the GET resource policies API  and check the Try it out button. 
    Screenshot of Swagger page with an arrow pointing to the grey 'Try it out!' button at the top right.

    If you click refresh instead of scrolling to the right place, your authorization will be lost!!

    1.4. Fill out the resourceTypeName field with "workspace".Screenshot of the parameters part of the Swagger page with 'workspace' in the 'Type of resource' field and the Google bucket d5eb5311-1cba-4c8f-84c5-27de52d2efbf in the 'id of resource' field

    1.5. Fill out the resourceid field with the unique workspace ID.
    The workspace ID is the workspace bucket ID minus the "fc-” (e.g., fc-d5eb5311-1cba-4c8f-84c5-27de52d2efbf). 

    The bucket name can be found in the workspace Dashboard tab on the right. You can use the clipboard icon to copy the bucket name (see screenshot below), and remove the “fc-” from the beginning to get the workspace ID.

    Screenshot of Google bucket ID on the workspace dashboard highlighting the clipboard icon to the right of the bucket ID

    1.5. Click Execute.

    1.6. Get the email corresponding to the right workspace role.

    The response section lists the "emails" that Terra assigns to each workspace role. You want to get the “email” for the appropriate workspace role. You will grant authorization to that "email" in the next step.
    Workspace Owner: policyName: “owner” - take the “email” value
    Screenshot of code 200 response body with email 'policy-a65f75a7-a65a-e6a6832a452@firecloud.org' and 'roles [owner]' circled

    Workspace Writer: policyName: “writer”: - take the “email” value
    Screenshot of code 200 response body with email 'policy-a65f75a7-a65a-e6a6832a452@firecloud.org' and 'policyName writer' circled

    Workspace Reader: policyName: “reader”: - take the “email” value
    Screenshot of code 200 response body with email 'policy-a65f75a7-a65a-e6a6832a452@firecloud.org' and 'roles [reader]' circled

    Step 2. Grant permission to workspace groups (newer workspaces)

    2.1. From the Google Cloud console, select the resource to be shared (i.e., a particular bucket in https://console.cloud.google.com/storage/browser).Screenshot of storage browswer in GCP console with 'external-bucket' selected and circled for emphasis

    2.2. Go to Permissions.Screenshot of Storage bucket details for 'external-bucket' on GCP with the Permissions tab

    2.3. View by Members and select the Add icon.Screenshot of Storage bucket permissions details for 'external-bucket' on GCP with the members tab circled and an orange arrow pointing to the 'Add' botton and icon

    2.4. Add the three workspace groups (i.e., workspace Owner, Writer and Reader from step 1 above) as New Members, selecting the resource type (left column - i.e., "Cloud Storage") and the appropriate roles.

    Selecting the right roles If you only want members of this workspace to be able to read from the bucket:
    Add the three groups as Storage Object Viewer to the external bucket. 

    If you want readers to read from the bucket, and writers/owners to read from the bucket, as well as write to the bucket: 
    1) Add all three workspace groups as Storage Object Viewer to the external bucket (if all members need to read from the bucket)
    2) Add the Owner and Writer groups as Storage Object Admin or Storage Object Creator.

    To find out which permission might be correct, please see the documentation here.

    Screenshot of the 'Add members' popup with 'GROUP_d5eb5311-1cba-4c8f-84c5-27de52d2efbf-OWNER@firecloud.org in the 'New members' field and 'Cloud storage' + 'Storage object creator' cirlced in the 'Select a role' dropdown.

    Storage object roles/permissions

    Storage Object Viewer: Grants access to view objects and their metadata, excluding ACLs. Can also list the objects in a bucket.

    Storage Object Admin: Grants full control over objects, including listing, creating, viewing, and deleting objects.

    Storage Object Creator: Allows users to create objects. Does not give permission to view, delete, or replace objects.

    You will see your workspace group and role in the Members Permissions.
    Screenshot of Members permissions in GCP console with 'policy-d5eb5311-1cba-4c8f-84c5-27de52d2efbf-OWNER@firecloud.org circled under the 'Member' column

  • Step 1. Find workspace permission groups (older workspaces)

    Workspace permission group formatting Owner:    GROUP_d5eb5311-1cba-4c8f-84c5-27de52d2efbf-OWNER@firecloud.org
    Writer:     GROUP_d5eb5311-1cba-4c8f-84c5-27de52d2efbf-WRITER@firecloud.org
    Reader:   GROUP_d5eb5311-1cba-4c8f-84c5-27de52d2efbf-READER@firecloud.org

    1.1  Prepend GROUP_

    1.2. Insert the unique workspace ID.The workspace ID is the workspace bucket ID minus the "fc-” (e.g. d5eb5311-1cba-4c8f-84c5-27de52d2efbf). 

    The bucket name can be found in the workspace Dashboard tab on the right. You can use the clipboard icon to copy the bucket name (see screenshot below), and remove the “fc-” from the beginning to get the workspace ID.

    Accessing-external-resources_Google-bucket-on-dashboard_Screen_shot.png

    1.3. Add the appropriate ending.
        Owners group: add -OWNER@firecloud.org to the end  
        Writers group: add -WRITER@firecloud.org to the end
        Readers group: add -READER@firecloud.org to the end

    Note: You will need all three workspace groups in the next step.

    Step 2. Grant permission to workspace groups (older workspaces)

    2.1. From the Google Cloud console, select the resource to be shared (i.e., a particular bucket in https://console.cloud.google.com/storage/browser).Grant-access-to-external-resource_Step-1_Screen_shot.png

    2.2. Go to Permissions.Grant-access-to-external-resources_Step-2_Screen_shot.png

    2.3. View by Members and select the Add icon.Grant-access-to-external-resources_Step-3_Screen_shot.png

    2.4. Add the three workspace groups (i.e., GROUP_<long string>-WRITER@firecloud.org) as New Members, selecting the resource type (left column - i.e., "Cloud Storage") and the appropriate roles.

    Selecting the right roles If you only want members of this workspace to be able to read from the bucket:
    Add the three groups as Storage Object Viewer to the external bucket. 

    If you want readers to read from the bucket, and writers/owners to read from the bucket, as well as write to the bucket: 
    1) Add all three workspace groups as Storage Object Viewer to the external bucket (if all members need to read from the bucket)
    2) Add the Owner and Writer groups as Storage Object Admin or Storage Object Creator.

    To find out which permission might be correct, please see the documentation here.

    Grant-access-to-external-resources_Step-4_Workspace-roles_Screen_shot_copy.png

    Storage object roles/permissions

    Storage Object Viewer: Grants access to view objects and their metadata, excluding ACLs. Can also list the objects in a bucket.

    Storage Object Admin:Grants full control over objects, including listing, creating, viewing, and deleting objects.

    Storage Object Creator: Allows users to create objects. Does not give permission to view, delete, or replace objects.

     

    If you get an error like this (screenshot below), it is likely a newer workspace. Please see the instructions above

    Accessing-External-Buckets_Error-message-no-permissions_Screen_shot.png

    You will see your workspace group and role in the Members Permissions.
    Grant-access-to-external-resources_Step-5_Workspace-roles_Screen_shot_copy.png

    Add the three groups as Storage Object Viewer to that external bucket

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.