YD edit: Overview: Managing access to controlled data with Authorization Domains

Anton Kovalsky
  • Updated

To protect access to controlled data, Authorization Domains offers additional control and protection beyond workspace permission. Authorization Domain protection follows a workspace when it's copied, restricting access to both primary and generated data to only authorized users. Read on to see how. 

Securing data in a cloud-native environment

To keep controlled data secure, but still easy to share with collaborators, Terra has several built-in security features that limit access in a workspace.

  • Accessing data stored on external systems requires linking to existing authorization 

    This system ensures that only people who already have access to the primary data (via traditional authorization mechanisms) can access the controlled data for analysis in Terra. 

  • Accessing data and tools in a workspace requires the right workspace permission

    Workspace permissions enable owners to very precisely control who can access what resources in a workspace. For many, workspace permissions are sufficient for all their needs. Note that you can share with a managed group to streamline the process by granting a single type of permission for a resource to a set of individuals. Sharing this way can eliminate the possibility for human error - such as forgetting to remove or add a person to a workspace when they leave or join the group. 

  • Accessing controlled data can require inclusion on an (optional) Authorization Domain
    Authorization Domains are an additional layer of protection for controlled data stored in a workspace bucket . ADs prevent access to the primary data as well as generated data unless you are on the AD.

By centralizing the process of managing access, these features can make working in a public cloud infrastructure safer than working in a traditional local analysis platform, where security is only as good as the weakest (human) link. 

How these security features work

Terra's security features work together to protect the data you store and work you do in a Terra workspace. Some are automatic (you have to link your authorization to be able to analyze controlled data stored on an external server) and some you (or your PI) will need to implement (assigning the right workspace permissions or assigning an Authorization Domain to a workspace). 

To learn more about how to access controlled data stored on external servers, see Linking authorization/accessing controlled data on external servers.

To learn more about best practices for sharing data and tools in a workspace, see Managing access to shared resources (data and tools).

To learn more about the pros and cons of the additional security of an Authorization Domain, read on!

Workspace permissions versus Authorization Domains

Authorization Domains are managed groups with strictly defined and enforced workspace permissions They are most useful when you need to control not only who has access to the primary data stored in a workspace but to all generated data, including data generated by a colleague (or a colleague of a colleague...).

Authorization Domains are like a badge associated with a workspace (and any copies of the workspace or data in the workspace storage) because only people with the same badge can have access, they prevent accidentally sharing derived data.

Authorization Domain characteristics

  • Assigned to workspaces when they are created and follow all workspace copies
  • Restrict workspace access to the individuals in the group. Note, however, that they do not grant access to users in the managed group. 
  • Protect access to all data generated on Terra from the original data.

Do you need an Authorization Domain?

Workspace permissions are sufficient for many use cases. To decide if you should assign an Authorization Domain for additional protection, ask yourself the following.  

1. Is primary protected data stored in the workspace bucket?

Authorization Domains are inherited, so they are most effective when applied to workspaces that contain primary, controlled data. 

2. Do you need to control what collaborators do with their own clones of the workspace?

You can control who has access to data and tools in a workspace with workspace permissions. However, collaborators with "can share" permission on the workspace with may make their own copies of the original that they can then share. 

While it is easy to control the permissions of the primary workspace, it can be hard  to keep track of access once collaborators make their own shareable copies. The owner of the original workspace has no control over them sharing with others outside the original group. Authorization Domains can be useful in this case.

3. Is there a well-defined group of people who should have access?

A consent group approved by an IRB or other agency to use controlled data stored in a workspace bucket is a good example of this. An Authorization Domain that includes only the consented users can protect primary data in the Workspace bucket as well as any generated data, no matter who generates or in what workspace it's generated. 

Another example would be if you want to let people share a workspace with colleagues, but only a certain well-defined set of colleagues (like in a given institution). Note that you can change the individuals in an AD group, but you cannot share a workspace with anyone outside the AD.

4. Do you anticipate sharing with people who are not on the Authorization Domain?

To ensure protection of controlled data, authorization domains are a permanent fixture for workspaces that use them. There is no easy way add collaborators to workspaces with an AD, unless you also add the collaborators to the AD.

Before you assign an Authorization DomainAuthorization Domains are permanent! If you need to share with people not on the AD down the line, you'll need to create new versions of the workspaces and copy over any TSVs/data/notebooks you need manually since clones of the workspace protected by the AD will automatically inherit the AD. Note that you will not be able to copy the workspace bucket contents to the new workspace, as the bucket contents are protected by the AD!

Are workspace permissions enough protection?
When cloning a workspace, the metadata in the data tab is copied, but any files referenced by that metadata stay in their original location. So the original files in the original workspace won't be accessible anyway if a workspace is cloned by someone with "reader" access to the workspace. They'll create a replica of the data table with the metadata, but it will point to the newly cloned workspace's bucket, which will be empty. Note, however, that derived results in the new workspace would be shared if using workspace permissions.   

If workspace permissions are enough for your needs, see Managing access to shared resources (data and tools for more details about how to use and set them up. 

Granting access to an AD protected workspace involves two separate steps

Accessing an AD-protected workspace is a two-step process. Both are required.

Step 1: Add user to the Authorization Domain group list

If the Authorization Domain is a third-party group (e.g., GTEx, TARGET, or TCGA), you will need to make sure to link your Terra account to your NIH or dbGaP authorization.

If the authorization Domain is a Terra user-group, you will need to be added to the group.

Step 2: Add workspace permission

Workspace owners will add a user to the workspace permission list using the "share" option, and adding that user with a role such as "reader", "writer", or "owner".

What happens if you have workspace permission but are not on the AD?

 Having a workspace protected by an Authorization Domain DOES supersede the workspace permission list. Even though it appears possible to add anyone to the permission list of a workspace, if the workspace is protected by an AD that doesn't include the user in question, that user won't actually be able to see the workspace. They may get an email notification that a workspace has been shared with them, but if they follow a link in that notification, they'll land on an error message stating they don't have access.

Authorization Domain examples and explanations

Below are examples of how you might use ADs to protect controlled data for different collaboration scenarios.

  • Step 1: The PI or PM creates one Lab-wide Authorization Domain, and adds researchers in the lab consented to use the data (i.e. those listed as dbGAP downloaders under the PI) to the AD group. 

    AD-Use-Case_Single-lab-single-project_Step1.png

    Step 2: The Authorization Domain group is included when creating any workspaces for that project. The workspace - and all copies of that workspace - are protected by the Authorization Domain

    AD-Use-Case_Single-lab-single-project_Step2.png

    Result: Only researchers consented to use the data can access, copy, or work in the workspaces (this overrides workspace permissions)

    AD-Use-Case_Single-lab-single-project_Step3.png

  • Step 1: PI or PM creates several Authorization Domain groups, one for each data consent group and adds researchers to all of the data consent groups that include them. 

    AD-Use-Case_Single-lab-many-projects_Step1.png

    Step 2: PI creates a primary workspace for each project, and includes the appropriate Authorization Domain. Note that a workspace could be protected by more than one ADs, depending on the data (i.e. if a workspace combines data from two consent groups, it will have two Authorization Domains) 

    AD-Use-Case_Single-lab-many-projects_Step2.png

    Result: A researcher must be included in all the workspace ADs to access a protected workspace

    AD-Use-Case_Single-lab-many-projects_Step3.png

  • Step 1: PI at one institution creates a project-specific authorization Domain for one consent group. Collaborators in both institutions are added to the Authorization Domain for the data they are consented to use.  

    AD-Use-Case_Cross-Institute-Collaborations_Step1.png


    Step 2:
    Collaborating institution creates a second Authorization Domain

    AD-Use-Case_Cross-Institution-Collaboration_Step2.png


    Result:
    Collaborators can access only the workspaces with data they are consented to use, regardless of what institution created the workspace or what institution they are at.

    AD-Use-Case_Cross-Institution-Collaboration_Step3.png

Compare: Traditional approach to data security

Let’s say everyone in your lab is consented on primary data stored in a workspace bucket.

If a new coworker asks you to share your data with them, you would (traditionally) be responsible for checking that this new coworker is officially consented to access the data before allowing them access to the workspace. You’d have to keep track for yourself which of your fellow scientists has up-to-date authorization. 

After checking their authorization, you could give your colleague the right permission and allow them access (read, write or edit permission, as well as "can-share" and "can compute" roles) to the data in the original workspace. 

This model works well enough if owners only have to keep track of their own workspaces. However, there is a danger, if you allow "can-share" permission to a colleague, that they can share with someone who is not authorized to access the data. This is where Authorization Domains can help!

Use managed groups to standardize access and eliminate some human errorsYou can share workspaces with a managed group, instead of individuals, to reduce the possibility that you will miss removing permissions on one workspace out of many when someone joins or leaves the a group. However, the burden of defining and enforcing security with workspace permissions ultimately lies with individual researchers. 

Improving data security with authorization domains

If an Authorization Domain that includes only those consented to use the primary data is assigned to the original workspace with the primary data, you don't need to worry about accidentally sharing sensitive data, including generated data. If anyone tries to share the cloned workspace with a user who doesn’t have the right badge, they won’t be able to view the workspace.O2b_May30_2019.png

Authorization domains prevent accidental data sharing

Consider a scenario where you are in your lab’s Authorization Domain, so you have access to a workspace containing primary data (files in the workspace bucket). You clone the original workspace, do your analysis and generate some derived data.

Your clone doesn't include the primary data if you haven't actively copied it, and you can still accomplish your analysis by pointing your clone's Data Tab metadata (or Jupyter notebook cells) to the original workspace's bucket. You may not think about the security implications of sharing your clone because the original files are still in the original workspace, but any outputs you've generated (aka "derived data") might now be in your cloned workspace bucket. WITHOUT an Authorization Domain, adding a user to your clone's permission list may grant access to the derived data. 

Since all ADs are inherited, the copied workspace also has the lab Authorization Domain. When you try to share the workspace with a new coworker, Terra will verify that your coworker is in the Authorization Domain before allowing access to the workspace (far right in diagram above).

Removing the burden of enforcing data access from the individual

In this way, the Authorization Domain keeps track of access so you don't have to. It's straightforward to adjust group membership (who is in the Authorization Domain) as lab members or consent groups change. Once membership is updated, it affects access to every AD-protected workspace right away.  

 

Importing data from a workspace with Authorization Domain protection

The Authorization Domains of the destination workspace (where the data is going to) must include all the Authorization Domains of the source workspace (where data is coming from). It is fine if the destination workspace is more restrictive about access to data, but it cannot be less.

Example case: Importing data from a workspace with an Authorization Domain
For example, if the destination workspace has TCGA-dbGap-Authorized and Tiffs-Test-Group groups in the Authorization Domain, you can import data from workspaces whose Authorization Domain is set to TCGA-dbGap-Authorized only, Tiffs-Test-Group only, both groups, or no groups. If the source workspace had additional groups, you would not be able to import from it. Terra will inform you if there are workspaces that are unavailable because of this.

Additional resources

To learn more about linking to external servers, see this article.

To learn How to create and share Authorization Domains, see this article

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.