Because research is frequently collaborative, you need to be able to keep sensitive genomic data secure, but still easy to share with collaborators. Terra was designed to help you balance these competing requirements with consent-specific Authorization Domains. AD protection follows a workspace when it's copied and only allows authorized access to both primary and generated data. Read on to see how.
Securing controlled-access data in a cloud-native environment
- Traditional approach to data security
- Improving data security with authorization domains
Some Authorization Domain examples
- Example 1: One lab, one research project/group
- Example 2: One lab, multiple projects/access groups
- Example 3: Cross-institution collaborations
A step by step guide to setting up and using authorization domains
Securing controlled-access data in a cloud-native environment
Researchers need authorization to work with sensitive (i.e. controlled) data. On Terra, access to primary data on external platforms is controlled via linked authorization (see this article to learn more). Additionally, you can protect data you bring to a workspace bucket or generate in an analysis by limiting access to project workspaces where sensitive data are stored and analyzed. Terra uses Authorization Domains - a built-in function - to streamline and centralize this security process.
Traditional approach to data security
Let’s say the data are private to your lab and stored in a workspace bucket. You know you are a member of your lab (indicated by the blue color in the diagram above), so you can access your lab’s data (also blue) in the workspace.
You clone the original workspace, run an analysis, and generate derived data. This is possible because your role allows you to copy the that workspace and access the data in the original workspace bucket.
If a new coworker asks you to share your workspace with them, you would (traditionally) be responsible for checking that this new coworker is officially a part of your lab. As the owner of the cloned workspace, there are no restrictions on how or with whom you share it. You’d have to keep track for yourself which of your fellow scientists have up-to-date authorization.
In this model, the burden of defining and enforcing security lies with individual researchers.
Improving data security with authorization domains
Enter Authorization Domains, which are like a badge associated with a workspace that allows access only to people with the same badge. When you clone a workspace that has an Authorization Domain, the badge stays with the new copy: anyone who wants to access the copy has to have the badge.
You no longer need to worry about accidentally sharing sensitive data because if you try to share the cloned workspace with a user who doesn’t have the right badge, that researcher won’t be able to enter.
How authorization domains prevent accidental data sharing
Let’s revisit our example from before, but with the addition of Authorization Domains. The PI sets up the original workspace where the sensitive data are store with an Authorization Domain to control access to the data (far left) in diagram above).
You are in your lab’s Authorization Domain, so you have access to the original workspace and primary data (assuming it has been shared with you). You clone the original workspace, do your analysis and generate some derived data (middle of diagram above).
Note that your clone does not include the primary data, which is in the original workspace bucket. You may not think about the security implications of sharing your clone, especially since only those with authorization (and permissions in the original workspace) can access data in the original bucket. However, the generated data in the clone are also restricted access.
Since all ADs are inherited, the copied workspace also has the lab Authorization Domain. When you try to share the workspace with a new coworker, Terra will verify that your coworker is in the Authorization Domain before allowing access to the workspace (far right in diagram above).
Removing the burden of enforcing data access from the individual
In this way, the Authorization Domain keeps track of access so you don't have to. And it's straightforward to adjust group membership (who is in the Authorization Domain) as lab members change. Once membership is updated, it affects access to every AD-protected workspace right away.
Some Authorization Domain examples
Example 1: One lab, one research project/group (expand for more detail)
Step 2: The Authorization Domain group is included when creating any workspaces for that project. The workspace - and all copies of that workspace - are protected by the Authorization Domain
Result: Only researchers consented to use the data can access, copy, or work in the workspaces (this overrides workspace permissions)
Example 2: One lab, multiple projects/access groups (expand for details)
Step 1: PI or PM creates several Authorization Domain groups, one for each data consent group and adds researchers to all of the data consent groups that include them.
Step 2: PI creates a primary workspace for each project, and includes the appropriate Authorization Domain. Note that a workspace could be protected by more than one ADs, depending on the data (i.e. if a workspace combines data from two consent groups, it will have two Authorization Domains)
Result: A researcher must be included in all the workspace ADs to access a protected workspace
Example 3: Cross-institution collaborations (expand for details)
Step 2: Collaborating institution creates a second Authorization Domain
Result: Collaborators can access only the workspaces with data they are consented to use, regardless of what institution created the workspace or what institution they are at.
A step-by-step guide to setting up and using Authorization Domains
(steps 1 and 2 shown above)
Step 1: Set up a group (i.e. give users their badges)
|All Authorized Domains are managed groups. But all managed groups are not authorization domains
Managed groups are a way to grant a single type of permission for a resource to a set of individuals:
An Authorization Domain is a managed group with strictly defined and enforced workspace permissions:
Before you can assign an Authorization Domain to a workspace, you will need to set up an authorization domain group. There are two ways to set up and manage groups, depending on whether you use a third-party or user-defined group
Third-party groups (TCGA, TARGET, GTEx)
For third-party groups, access depends on external permissions. Currently Terra supports third-party party groups including TCGA Controlled-Access, GTEx, and Target. To gain access, you must link your Terra account to your eRA Commons or NIH account on your Profile page.
Terra then checks for the user ID of the linked account in the dbGAP access list to complete the authorization.
Your PI can create a(user-defined) group by going to the Groups page in the main menu navigation under your username. Follow the prompts to create a group, e.g. “sample_group”, and add each member of your lab to the group, thereby giving them the “sample_group” badges. The PI (or anyone they give Owner access to the group) is then responsible for giving and revoking these badges.
Step 2: Create workspace and assign the workspace Authorization Domain
When creating a workspace, you'll start from this form (screenshot below).
You can select one or more groups for the Authorization Domain in the dropdown. (If you don’t see your group in the list, you may need to create it. See Step 1 above).
An Authorization Domain can only be set when creating the workspace, and once set, it cannot be removed from the workspace.
It will be copied over to any cloned version of the workspace, protecting any derived data.
When an Authorization Domain includes multiple groups (click to expand)
Consider a workspace whose Authorization Domain contains both the TCGA and Target groups. If a user is invited to the workspace, the system checks both the TCGA and the Target access lists for their accounts before allowing access.
Importing data from a workspace with Authorization Domain protection (click for more)
For example, if the destination workspace has TCGA-dbGap-Authorized and Tiffs-Test-Group groups in the Authorization Domain, you can import data from workspaces whose Authorization Domain is set to TCGA-dbGap-Authorized only, Tiffs-Test-Group only, both groups, or no groups. If the source workspace had additional groups, you would not be able to import from it. In this example, Terra informs you there are six workspaces that are unavailable because of this.
Step 3: Share the workspace - step-by-step instructions
To complete the process, you can now share the workspace, either with the group you used in the Authorization Domain, or with one or more individuals.
To share with a group, start typing the name into the Sharing dialog and choose from the autocomplete options:
If you share with individuals or a group not in the Authorization Domain, they will see the workspace greyed out in their workspace list. When they click it, Terra will send an email to all owners of the groups in the Authorization domain requesting access. Once the user has the proper badge(s), they can enter the workspace to see the protected data.
If you receive an error message that you aren't a member of the authorization domain for a GTEx, TARGET, or TCGA workspace, this generally means your authorization in your NIH/dbGaP link isn't active. Access to the AD is automated based on authorization from dbGaP, which is updated every six hours on Terra.
To learn more about linking to external servers, see this article.