Often used for protecting access to controlled data, Authorization Domains offers additional control and protection beyond workspace permission. Authorization Domain protection follows a workspace when it's copied, restricting access to both primary and generated data to only authorized users. Read on to see how.
Securing controlled-access data in a cloud-native environment
To keep controlled data secure, but still easy to share with collaborators, Terra has several built-in security features that limit access in a workspace.
- Accessing data stored on external systems requires linking to existing authorization
This system ensures that only people who already have access to the primary data (via traditional authorization mechanisms) can access the controlled data for analysis in Terra.
- Accessing data and tools in a workspace requires appropriate workspace permission
Workspace permissions enable very precise control of who can access what resources in a workspace. For many users, workspace permissions are sufficient for all their needs. Note that you can share with a managed group to streamline the process by granting a single type of permission for a resource to a set of individuals. Sharing this way can eliminate the possibility for human error - such as forgetting to remove or add a person to a workspace when they leave or join the group.
- Accessing controlled data can require inclusion on an (optional) Authorization Domain
Authorization Domains are an additional layer of protection for controlled data stored in a workspace bucket . ADs prevent access to the primary data as well as generated data unless you are on the AD.
By centralizing the process of managing access, these features can make working in a public cloud infrastructure safer than working in a traditional local analysis platform, where security is only as good as the weakest (human) link.
How these security features work
Terra's security features work together to protect the data you store and work you do in a Terra workspace. Some are automatic (you have to link your authorization to be able to analyze controlled data stored on an external server) and some you (or your PI) will need to implement (assigning the right workspace permissions or assigning an Authorization Domain to a workspace).
To learn more about how to access to controlled data stored on external servers, see Linking authorization/accessing controlled data on external servers.
To learn more about best practices for sharing data and tools in a workspace, see Managing access to shared resources (data and tools).
To learn more about the pros and cons of the additional security of an Authorization Domain, read on!
Protecting access: Workspace permissions versus Authorization Domains
Authorization Domains are managed groups with strictly defined and enforced workspace permissions that enforces controlled access to data stored in a workspace bucket. They are most useful when you need to control not only who has access to the primary data stored in a workspace but to all generated data, including data generated by a colleague (or a colleague of a colleague...).
- Authorization domains restrict workspace access to the individuals in the group.
- ADs protect access to all data generated on Terra from the original data.
- ADs are assigned to workspaces when they are created and follow all workspace copies.
Workspace permissions are sufficient for many use cases. To decide if you should assign an Authorization Domain for additional protection, ask yourself the following.
1. Is primary protected data stored in the workspace bucket?
Authorization Domains are inherited, so they are most effective when applied to workspaces that contain primary, controlled data.
2. Do you need to control what collaborators do with their own clones of the workspace?
You can control who has access to data and tools in a workspace with workspace permissions. However, collaborators with "can share" permission on the workspace with may make their own copies of the original that they can then share.
While it is easy to control the permissions of the primary workspace, it can be hard to keep track of access once collaborators make their own shareable copies. The owner of the original workspace has no control over them sharing with others outside the original group. Authorization Domains can be useful in this case.
3. Is there a well-defined group of people who should have access?
A consent group approved by an IRB or other agency to use controlled data stored in a workspace bucket is a good example of this. An Authorization Domain that includes only the consented users can protect primary data in the Workspace bucket as well as any generated data, no matter who generates or in what workspace it's generated.
Another example would be if you want to let people share a workspace with colleagues, but only a certain well-defined set of colleagues (like in a given institution). Note that you can change the individuals in an AD group, but you cannot share with anyone outside the AD.
4. Do you anticipate sharing with people who are not on the Authorization Domain?
To ensure protection of controlled data, authorization domains are a permanent fixture for workspaces that use them. There is no easy way add collaborators to workspaces with an AD, unless you also add the collaborators to the AD.
Before you assign an Authorization DomainAuthorization Domains are permanent! To share down the line with people not on the AD, you'll need to create new versions of the workspaces and copy over any TSVs/data/notebooks you need manually since clones of the workspace protected by the AD will automatically inherit the AD. Note that you will not be able to copy the workspace bucket contents to the new workspace, as the bucket contents are protected by the AD!
Are workspace permissions enough protection?
Cloning only copies metadata (links to data in a workspace bucket, not the data files themselves). So the original files in the original workspace won't be accessible anyway if that person shares with someone who shouldn't get the data. Note, however, that derived results in the new workspace would be shared if using workspace permissions.
If workspace permissions are enough for your needs, see Managing access to shared resources (data and tools for more details about how to use and set them up.
Some Authorization Domain examples
Below are examples of how you might use ADs to protect controlled data for different collaboration scenarios.
Step 1: The PI or PM creates one Lab-wide Authorization Domain, and adds researchers in the lab consented to use the data (i.e. those listed as dbGAP downloaders under the PI) to the AD group.
Step 2: The Authorization Domain group is included when creating any workspaces for that project. The workspace - and all copies of that workspace - are protected by the Authorization Domain
Result: Only researchers consented to use the data can access, copy, or work in the workspaces (this overrides workspace permissions)
Step 1: PI or PM creates several Authorization Domain groups, one for each data consent group and adds researchers to all of the data consent groups that include them.
Step 2: PI creates a primary workspace for each project, and includes the appropriate Authorization Domain. Note that a workspace could be protected by more than one ADs, depending on the data (i.e. if a workspace combines data from two consent groups, it will have two Authorization Domains)
Result: A researcher must be included in all the workspace ADs to access a protected workspace
Step 1: PI at one institution creates a project-specific authorization Domain for one consent group. Collaborators in both institutions are added to the Authorization Domain for the data they are consented to use.
Step 2: Collaborating institution creates a second Authorization Domain
Result: Collaborators can access only the workspaces with data they are consented to use, regardless of what institution created the workspace or what institution they are at.
Traditional approach to data security
Let’s say everyone in your lab is consented on primary data stored in a workspace bucket.
If a new coworker asks you to share you data with them, you would (traditionally) be responsible for checking that this new coworker is officially consented to access the data before allowing them access the workspace. You’d have to keep track for yourself which of your fellow scientists has up-to-date authorization.
After checking their authorization, you could give your colleague the right permission and allow them access (read, write or edit permission, as well as "can-share" and "can compute" roles for their workspace) to the data in the original workspace.
This model works well enough if owners only have to keep track of their own workspaces. However, there is a danger, if you allow "can-share" permission to a colleague, that they can share with someone who is not authorized to access the data. This is where Authorization Domains can help!
Use managed groups to standardize access and eliminate some human errorsYou can share workspaces with a managed group, instead of individuals, to reduce the possibility that you will miss removing permissions on one workspace out of many when someone joins or leaves the a group. However, the burden of defining and enforcing security with workspace permissions ultimately lies with individual researchers.
Improving data security with authorization domains
Authorization Domains are like a badge associated with a workspace that allows access only to people with the same badge. They prevent accidentally sharing derived data because ADs stay with all copies of the original workspace: anyone who wants to access the copy has to be in the AD.
If an Authorization Domain that includes only those consented to use the primary data is assigned to the original workspace with the primary data, you don't need to worry about accidentally sharing sensitive data. If anyone tries to share the cloned workspace with a user who doesn’t have the right badge, they won’t be able to enter.
Authorization domains prevent accidental data sharing
Let’s revisit our example from before, but with the addition of Authorization Domains. The PI sets up a workspace to store the primary data with an Authorization Domain to control access to the data (far left in diagram above).
You are in your lab’s Authorization Domain, so you have access to the original workspace and primary data (assuming it has been shared with you). You clone the original workspace, do your analysis and generate some derived data (middle of diagram above).
Note that your clone does not include the primary data, which is in the original workspace bucket. You may not think about the security implications of sharing your clone, especially since only those with authorization (and permissions in the original workspace) can access data in the original bucket. However, the generated data in the clone are also restricted access.
Since all ADs are inherited, the copied workspace also has the lab Authorization Domain. When you try to share the workspace with a new coworker, Terra will verify that your coworker is in the Authorization Domain before allowing access to the workspace (far right in diagram above).
Removing the burden of enforcing data access from the individual
In this way, the Authorization Domain keeps track of access so you don't have to. It's straightforward to adjust group membership (who is in the Authorization Domain) as lab members change. Once membership is updated, it affects access to every AD-protected workspace right away.
How to set up and use an Authorization Domain
(steps 1 and 2 shown above)
Step 1: Set up a group (i.e. give users their badges)
Before you can assign an Authorization Domain to a workspace, you will need to set up an authorization domain group. There are two ways to set up and manage groups, depending on whether you use a third-party or user-defined group
Third-party groups (TCGA, TARGET, GTEx)
For third-party groups, access depends on external permissions. Currently Terra supports third-party party groups including TCGA Controlled-Access, GTEx, and Target. To gain access, you must link your Terra account to your eRA Commons or NIH account on your Profile page.
Click here to learn how to link data authorization to your Terra account.
Terra then checks for the user ID of the linked account in the dbGAP access list to complete the authorization.
User-defined groupsUser-defined groups are created and managed within Terra. Groups are straightforward to set up, and are perfect to use when you want to share data with a set group of people (within your lab, for example).
Your PI can create a(user-defined) group by going to the Groups page in the main menu navigation under your username. Follow the prompts to create a group, e.g. “sample_group”, and add each member of your lab to the group, thereby giving them the “sample_group” badges. The PI (or anyone they give Owner access to the group) is then responsible for giving and revoking these badges.
Step 2: Create workspace and assign the workspace Authorization Domain
When creating a workspace, you'll start from this form (screenshot below).
You can select one or more groups for the Authorization Domain in the dropdown. (If you don’t see your group in the list, you may need to create it. See Step 1 above).
An Authorization Domain can only be set when creating the workspace, and once set, it cannot be removed from the workspace.
It will be copied over to any cloned version of the workspace, protecting any derived data.
When an Authorization Domain includes multiple groups
When multiple groups are included in the Authorization Domain, the system requires the user to be a member of all groups in order to access the workspace. This is because there are strict guidelines with third-party dbGaP registered datasets (TCGA and Target).
Example case: Multiple Authorization Domains
Consider a workspace whose Authorization Domain contains both the TCGA and Target groups. If a user is invited to the workspace, the system checks both the TCGA and the Target access lists for their accounts before allowing access.
Importing data from a workspace with Authorization Domain protection
The Authorization Domains of the destination workspace (where the data is going to) must include all the Authorization Domains of the source workspace (where data is coming from). It is fine if the destination workspace is more restrictive about access to data, but it cannot be less.
Example case: Importing data from a workspace with an Authorization Domain
For example, if the destination workspace has TCGA-dbGap-Authorized and Tiffs-Test-Group groups in the Authorization Domain, you can import data from workspaces whose Authorization Domain is set to TCGA-dbGap-Authorized only, Tiffs-Test-Group only, both groups, or no groups. If the source workspace had additional groups, you would not be able to import from it. In this example, Terra informs you there are six workspaces that are unavailable because of this.
Step 3: Share the workspace - step-by-step instructions
To complete the process, you will now share the workspace, either with the group you used in the Authorization Domain, or with one or more individuals.
Make sure your colleagues can access AD-protected workspaces To access a workspace protected with an Authorization Domain, a colleague needs both of the following:
1. to be included in all the workspace ADs
2. to have reader, writer or editor permission on the workspace.
Both are required!! Just because someone is on an AD does not mean a workspace is automatically shared with them!
To share with a group, start typing the name into the Sharing dialog and choose from the autocomplete options.
What happens when someone not in the AD tries to access the workspace? If you share with individuals or a group not in the Authorization Domain, they will see the workspace greyed out in their workspace list. When they click it, Terra will send an email to all owners of the groups in the Authorization domain requesting access. Once the user has the proper badge(s), they can enter the workspace to see the protected data.
GTEx, TARGET, or TCGA workspace? If you receive an error message that you aren't a member of the authorization domain for a GTEx, TARGET, or TCGA workspace, this generally means your authorization in your NIH/dbGaP link isn't active. Access to the AD is automated based on authorization from dbGaP, which is updated every six hours on Terra.
To learn more about linking to external servers, see this article.