Because research is frequently collaborative, maintaining control and privacy of sensitive data are significant concerns in the world of genetic analysis. How do you balance keeping data secure, but still easy to share? Terra was designed with these competing requirements in mind. Read on to see how.
Securing controlled-access data in a cloud-native environment
Conventionally, a researcher needs authorization to work with sensitive data (let’s say it’s private to your lab). On Terra, you would have a workspace in which you did your analysis and where any data generated would be stored. Let's assume that the derived data must still be kept under controlled access.
Traditional approach to data security
You know you are a member of your lab (indicated by the blue color in the diagram above), so you can access your lab’s data (blue again) in the workspace. You clone the original workspace to run an analysis and generate derived data. This is possible because your role gives you permission to copy the that workspace.
If a new coworker asks you to share your workspace with them, you would be responsible for checking that this new coworker is officially a part of your lab rather than an imposter! (Unlikely, but still). You are the owner of the cloned workspace, with no restrictions on how or with whom you share it. You’d have to keep track for yourself which of your fellow scientists do and do not have up-to-date authorization.
Assuring security with authorization domains
Enter Authorization Domains, which are like a badge that allows access to workspaces for people with the same badge. When you clone a workspace with an Authorization Domain, the badge stays with the new copy, and anyone who wants to access the copy has to have the badge. You no longer need to worry about accidentally sharing sensitive data because if you try to share the cloned workspace with a user who doesn’t have the right badge, that researcher won’t be able to enter.
Let’s revisit our example from before, but with the addition of Authorization Domains. You are in your lab’s Authorization Domain, and you’re working with a workspace that also has the lab’s Authorization Domain. You do your analysis, generate some derived data, and create a clone. Since all ADs are inherited, the copied workspace also has the lab Authorization Domain. When you try to share the workspace with your new coworker, the authorization domain will check to see if your coworker is in the Authorization Domain, or if they are an imposter! (Or more likely, need to ask to be included in the AD). Either way, the responsibility is on the authorization domain to keep track of access, not you.
Note that Authorization Domain lists can be updated and modified.
A step by step guide to setting up and using authorization domains
(steps 1 and 2 shown above)
Step 1: Set the workspace Authorization Domain
When creating a workspace, you can select one or more groups to set as the Authorization Domain. (If you don’t see your group in the list, you may need to create it. See Step 2). An Authorization Domain can only be set when creating the workspace, and once set, it cannot be removed from the workspace. It will be copied over to any cloned version of the workspace to keep any derived data protected.
When multiple groups are included in the Authorization Domain, the system requires the user to be a member of all groups in order to access the workspace. This is because there are strict guidelines with third-party dbGaP registered datasets (TCGA and Target).
For example, say there is a workspace whose Authorization Domain contains the TCGA and Target groups. If a user is invited to the workspace, the system checks both the TCGA and the Target access lists for their accounts before allowing access.
To import data from another workspace, the Authorization Domains of the destination workspace (where the data is going to) must include (at minimum) all the Authorization Domains of the source workspace (where data is coming from). In other words, it is fine if the destination workspace is more restrictive about access to data, but it cannot be less.
For example, if the destination workspace has TCGA-dbGap-Authorized and Tiffs-Test-Group groups in the Authorization Domain, you can import data from workspaces whose Authorization domain is set to TCGA-dbGap-Authorized only (row 2, 3, 5-10), Tiffs-Test-Group only (row 4), both groups (row 1) or no groups. If the source workspace had additional groups, you would not be able to import from it. In this example, Terra informs you there are six workspaces that are unavailable because of this.
Step 2: Give users their badges
There are two ways to get your badge, depending on whether you use a third-party or user-defined group. The difference between third-party and user-defined is how membership to the group is managed.
For third-party groups, external permissions are checked in order to give you access. Currently Terra supports two third-party party groups, TCGA Controlled Access and Target. To gain access, you must link your Terra account to your eRA Commons or NIH account on your Profile page. Terra then checks for the user ID of the linked account in the dbGAP whitelist to complete the authorization.
User-defined groups are created and managed within Terra. Groups are simple to set up, and are perfect to use when you want to share data with a set group of people (within your lab, for example). Your PI can create a(user-defined) group by going to the Groups page found in the menu under your username, following the prompts to create a group, e.g. “sample_group”, and adding each member of your lab to the group, thereby giving them the “sample_group” badges. The PI (or anyone they give Owner access to the group) is then responsible for giving and revoking these badges.
Step 3: Share the workspace
To complete the process, you can now share the workspace, either with the group you used in the Authorization Domain, or with an individual.
To share with a group, start typing the name into the Sharing dialog and choose from the autocomplete options:
If you share with individuals or a group who is not in the Authorization Domain, they will see the workspace greyed out in their workspace list. When they click it, Terra facilitates a request process that sends an email to all owners of the groups in the Authorization domain. Once the user has the proper badge(s), they can enter the workspace to see the protected data.
You can also create new workspaces within existing authorization domains so that users already in that authorization domain will already have the proper permission to enter that workspace once it is shared with them: