Managing data privacy and access with Authorization Domains

Anton Kovalsky
  • Updated

Because research is frequently collaborative, you need to be able to keep sensitive genomic data secure, but still easy to share with collaborators. Terra was designed to help you balance these competing requirements with consent-specific Authorization Domains. AD protection follows a workspace when it's copied and only allows authorized access to both primary and generated data. Read on to see how.

Securing controlled-access data in a cloud-native environment

Researchers need authorization to work with sensitive (i.e. controlled) data. On Terra, access to primary data on external platforms is controlled via linked authorization (see Linking authorization/accessing controlled data on external servers to learn more). Additionally, you can protect workspace data - data in the workspace bucket plus data generated in an analysis - by limiting access to workspaces where sensitive data are stored and analyzed. Terra uses Authorization Domains - a built-in function - to streamline and centralize this security process.O2a_May30_2019.png

Traditional approach to data security

Let’s say the data are private to your lab and stored in a workspace bucket. You know you are a member of your lab (indicated by the blue color in the diagram above), so you can access your lab’s data (also blue) in the workspace.

You clone the original workspace, run an analysis, and generate derived data. This is possible because your role allows you to copy the that workspace and access the data in the original workspace bucket.

If a new coworker asks you to share your workspace with them, you would (traditionally) be responsible for checking that this new coworker is officially a part of your lab. As the owner of the cloned workspace, there are no restrictions on how or with whom you share it. You’d have to keep track for yourself which of your fellow scientists have up-to-date authorization.

In this model, the burden of defining and enforcing security lies with individual researchers.

Improving data security with authorization domains

Enter Authorization Domains, which are like a badge associated with a workspace that allows access only to people with the same badge. When you clone a workspace that has an Authorization Domain, the badge stays with the new copy: anyone who wants to access the copy has to have the badge.

You no longer need to worry about accidentally sharing sensitive data because if you try to share the cloned workspace with a user who doesn’t have the right badge, that researcher won’t be able to enter.O2b_May30_2019.png

How authorization domains prevent accidental data sharing

Let’s revisit our example from before, but with the addition of Authorization Domains. The PI sets up the original workspace where the sensitive data are store with an Authorization Domain to control access to the data (far left) in diagram above).

You are in your lab’s Authorization Domain, so you have access to the original workspace and primary data (assuming it has been shared with you). You clone the original workspace, do your analysis and generate some derived data (middle of diagram above).

Note that your clone does not include the primary data, which is in the original workspace bucket. You may not think about the security implications of sharing your clone, especially since only those with authorization (and permissions in the original workspace) can access data in the original bucket. However, the generated data in the clone are also restricted access. 

Since all ADs are inherited, the copied workspace also has the lab Authorization Domain. When you try to share the workspace with a new coworker, Terra will verify that your coworker is in the Authorization Domain before allowing access to the workspace (far right in diagram above).

Removing the burden of enforcing data access from the individual

In this way, the Authorization Domain keeps track of access so you don't have to. And it's straightforward to adjust group membership (who is in the Authorization Domain) as lab members change. Once membership is updated, it affects access to every AD-protected workspace right away.  

Authorization Domains versus managed groups.Managed groups are a way to grant a single type of permission for a resource to a set of individuals:
   - Share a workspace with a group
   - Grant can-compute permission to a group
   - Include a group on a Terra Billing project, etc. 

Authorization Domains are managed groups with strictly defined and enforced workspace permissions:
   - Authorization domains restrict workspace access to the individuals in the group.
   - ADs are assigned to workspaces when they are created and follow all workspace copies.

All Authorization Domains are managed groups. But not all managed groups are Authorization Domains

Some Authorization Domain examples

Example 1: One lab, one research project/group

Step 1: The PI or PM creates one Lab-wide Authorization Domain, and adds researchers in the lab consented to use the data (i.e. those listed as dbGAP downloaders under the PI) to the AD group. 

AD-Use-Case_Single-lab-single-project_Step1.png

Step 2: The Authorization Domain group is included when creating any workspaces for that project. The workspace - and all copies of that workspace - are protected by the Authorization Domain

AD-Use-Case_Single-lab-single-project_Step2.png

Result: Only researchers consented to use the data can access, copy, or work in the workspaces (this overrides workspace permissions)

AD-Use-Case_Single-lab-single-project_Step3.png

Example 2: One lab, multiple projects/access groups

Step 1: PI or PM creates several Authorization Domain groups, one for each data consent group and adds researchers to all of the data consent groups that include them. 

AD-Use-Case_Single-lab-many-projects_Step1.png

Step 2: PI creates a primary workspace for each project, and includes the appropriate Authorization Domain. Note that a workspace could be protected by more than one ADs, depending on the data (i.e. if a workspace combines data from two consent groups, it will have two Authorization Domains) 

AD-Use-Case_Single-lab-many-projects_Step2.png

Result: A researcher must be included in all the workspace ADs to access a protected workspace

AD-Use-Case_Single-lab-many-projects_Step3.png

Example 3: Cross-institution collaborations

Step 1: PI at one institution creates a project-specific authorization Domain for one consent group. Collaborators in both institutions are added to the Authorization Domain for the data they are consented to use.  

AD-Use-Case_Cross-Institute-Collaborations_Step1.png


Step 2:
Collaborating institution creates a second Authorization Domain

AD-Use-Case_Cross-Institution-Collaboration_Step2.png


Result:
Collaborators can access only the workspaces with data they are consented to use, regardless of what institution created the workspace or what institution they are at.

AD-Use-Case_Cross-Institution-Collaboration_Step3.png

How to set up and use Authorization Domains

G11a_Apr19_2019.gif

(steps 1 and 2 shown above)

Step 1: Set up a group (i.e. give users their badges)

Before you can assign an Authorization Domain to a workspace, you will need to set up an authorization domain group. There are two ways to set up and manage groups, depending on whether you use a third-party or user-defined group

  • Third-party groups (TCGA, TARGET, GTEx)

    For third-party groups, access depends on external permissions. Currently Terra supports third-party party groups including TCGA Controlled-Access, GTEx, and Target. To gain access, you must link your Terra account to your eRA Commons or NIH account on your Profile page.

    Click here to learn how to link data authorization to your Terra account.

    Terra then checks for the user ID of the linked account in the dbGAP access list to complete the authorization.

  • User-defined groups

    User-defined groups are created and managed within Terra. Groups are straightforward to set up, and are perfect to use when you want to share data with a set group of people (within your lab, for example).


    G11a_Apr19_2019.gif

    Your PI can create a(user-defined) group by going to the Groups page in the main menu navigation under your username. Follow the prompts to create a group, e.g. “sample_group”, and add each member of your lab to the group, thereby giving them the “sample_group” badges. The PI (or anyone they give Owner access to the group) is then responsible for giving and revoking these badges.  

Step 2: Create workspace and assign the workspace Authorization Domain 

When creating a workspace, you'll start from this form (screenshot below).

Authorization-domain_Set-domain-when-creating-workspace_Screen_shot.png

You can select one or more groups for the Authorization Domain in the dropdown. (If you don’t see your group in the list, you may need to create it. See Step 1 above).

An Authorization Domain can only be set when creating the workspace, and once set, it cannot be removed from the workspace.

It will be copied over to any cloned version of the workspace, protecting any derived data.

When an Authorization Domain includes multiple groups

When multiple groups are included in the Authorization Domain, the system requires the user to be a member of all groups in order to access the workspace. This is because there are strict guidelines with third-party dbGaP registered datasets (TCGA and Target).

Example case: Multiple Authorization Domains
Consider a workspace whose Authorization Domain contains both the TCGA and Target groups. If a user is invited to the workspace, the system checks both the TCGA and the Target access lists for their accounts before allowing access.

Importing data from a workspace with Authorization Domain protection

The Authorization Domains of the destination workspace (where the data is going to) must include all the Authorization Domains of the source workspace (where data is coming from). It is fine if the destination workspace is more restrictive about access to data, but it cannot be less.

Example case: Importing data from a workspace with an Authorization Domain
For example, if the destination workspace has TCGA-dbGap-Authorized and Tiffs-Test-Group groups in the Authorization Domain, you can import data from workspaces whose Authorization Domain is set to TCGA-dbGap-Authorized only, Tiffs-Test-Group only, both groups, or no groups. If the source workspace had additional groups, you would not be able to import from it. In this example, Terra informs you there are six workspaces that are unavailable because of this.

Step 3: Share the workspace - step-by-step instructions

To complete the process, you can now share the workspace, either with the group you used in the Authorization Domain, or with one or more individuals.

To share with a group, start typing the name into the Sharing dialog and choose from the autocomplete options.
G11b_Apr19_2019.gif

What happens when someone not in the AD tries to access the workspace? If you share with individuals or a group not in the Authorization Domain, they will see the workspace greyed out in their workspace list. When they click it, Terra will send an email to all owners of the groups in the Authorization domain requesting access. Once the user has the proper badge(s), they can enter the workspace to see the protected data.

GTEx, TARGET, or TCGA workspace? If you receive an error message that you aren't a member of the authorization domain for a GTEx, TARGET, or TCGA workspace, this generally means your authorization in your NIH/dbGaP link isn't active. Access to the AD is automated based on authorization from dbGaP, which is updated every six hours on Terra.

Additional resources

To learn more about linking to external servers, see this article.

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.