Learn how to share data across teams or handle access to datasets that aren't hosted in Terra but must be accessible to a group of Terra users. Read on for recommendations depending on the size of your dataset, whether it is controlled- or open-access, and who needs access.
First step: Manage access to shared assets with groups
The first step is to determine with whom you need to share. If the access list is large or often-changing, you can streamline permissions across resources by using managed groups.
Advantages to using Terra managed groups
- When you assign resource permissions and access to the group, you make changes only to the group, rather than updating each individual's permissions for each resource.
- Admins can add or remove people from the group at any time; all workspaces and resources shared with a group will adjust appropriately.
- When you remove people from the group, they no longer have access to shared resources.
To learn more, see Managing shared resources with groups and permissions.
Note: You can also share external buckets with Groups created in Terra (since Terra is built on the Google Cloud infrastructure).
How to create a managed group (step-by-step instructions)
1. Go to your groups page from the main navigation menu at the top left.
Your Name > Groups.
2. Add or delete members and assign admin roles to allow others in the group to be able to add or remove people.
3. Once it's set up, you can share workspaces, external buckets, and other resources with the group in Terra.
What to expect
Group members will have immediate access to the workspace, including data in workspace cloud storage. If a person is removed from the group, as when someone leaves a lab, they cannot access group data any longer.
Option 1: Sharing smaller datasets, fewer people (workspace storage)
Terra is designed to protect your data, and you can take advantage of the platform's built-in security by storing shared data in workspace storage. Access, once you set it up, applies across the entire platform.
Access data from anywhere in Terra
For example, if data are in workspace storage and you have reader, writer, or owner permissions to that workspace, you can access the data from any of your workspaces in Terra. Note that for restricted-access data, you will also need to be in all of the required Authorization Domain(s).
How to share data in a workspace bucketSharing the workspace where the data are stored with other Terra users is the only way to share data stored in workspace buckets in Terra
In Terra, it's not possible to make a workspace bucket public for people outside the workspace or outside the Terra platform. However, note that Terra admins can make a bucket public to registered users for access on Terra.
How to make data accessible to your collaborators (step by step)
1. Store data in workspace storage (i.e., Google bucket - see Moving data between local storage and the workspace bucket for more details).
2. Share the workspace with collaborators or a group.
3. Grant permission (reader, writer, or owner) to individual users - or the group.
Who can access shared data in a workspace bucket...and who pays?
Workspace owners control access to data. Anyone with reader, writer or owner permission for a workspace will be able to access the data in the associated bucket from any of their Terra workspaces. All full paths to data files - as in data tables and workflow configurations - will be seamless (data will appear to be local). You can use gsutil in a terminal or a notebook to copy data from the original bucket to a different workspace bucket. Generated data will be stored by default in the workspace where you run the analysis.
Storage costs for shared data in workspace storage (i.e., Google bucket)The Google Billing Account associated with the Terra Billing Project of the workspace holding the data pays for data storage and transfer charges - if someone else downloads the data to their own bucket or local machine.
To avoid data transfer charges, use an external requester pays bucket. For more detail, see Configure Google Cloud Storage to prevent data transfer charges.
Example: Ensuring access to shared data in workspace storage
To access shared data, a user must be included in the permissions of the workspace where the primary data are stored. See the scenario below to understand where problems can arise when a user doesn't have access to the original workspace.
A Terra user ID is used to track access.
This can impact anyone who has multiple user IDs, e.g., a personal gmail as well as an institutional account.
- User A creates a workspace (workspace 1) and stores data to be shared with the group in the Workspace 1 bucket.
- User A gives his research group (Group X) writer permission for Workspace 1. User B is in the research group and has access to Workspace 1 and the shared data.
- User B makes a copy of Workspace 1 (Workspace 2), which references the data in Workspace 1 in its data table. However, the shared input data are not in the Workspace 2 bucket.
- User B runs the workflows cloned from Workspace 1 in Workspace 2. They run successfully because User B has access to the input data is in Workspace 1. The outputs are stored in the Workspace 2 bucket.
- User B shares Workspace 2 with User C, who is not in Group X (i.e., does not have access to Workspace 1 with the original shared data).
- User C tries running the workflows in Workspace 2, but the workflows fail because User C doesn't have access to Workspace 1 where the data are kept.
Protecting controlled-access data (Authorization Domains)
Relying upon workspace-sharing permissions alone can lead to unauthorized access! This can happen when someone with access makes a copy and shares with someone not authorized for the primary data.
For additional protection around restricted-access data, you can store shared data assets in the dedicated bucket of a workspace under an Authorization Domain. To enable members of a group to access the data, you must first include the group in the Authorization Domain, then share the workspace.
Note: The Authorization Domain requirement is inherited by all clones of the original workspace. This ensures that data in a workspace bucket under an Authorization Domain remain under the same restrictions, as the workspace is shared and copied.
To learn more about protecting controlled-access data, see Managing data privacy and access with Authorization Domains.
Limitations of sharing data in workspace storage
Cloning a workspace doesn’t copy data to the new workspace bucket
Though the data tables in the cloned workspace look populated, they will still point to the original workspace location of the files. Users of cloned workspaces can run an analysis on the shared data only if they have (at minimum) reader permission on the original workspace where the data are stored.
Troubleshooting workflows with shared data can be challenging
Because workspace bucket names are random strings, it can be hard to identify which workspace includes the actual data files from the full path (in a data table or workflow configuration). For example, could you determine which workspace contains the following file: fc-secure-7124e053-c020-4a76-a372-f1bb9272a32d/sample1.cram
?
Without an easy way to identify the original workspace, it can be challenging to troubleshoot workflows that are failing because of permissions issues (e.g., if the permissions in the two workspaces are different).
Permissions could be different even for the same user if they use one login (credential) in one workspace, and another login (with a second user ID and credentials) in the second. The user may be the same, but the lack of the proper credentials in the second workspace could mean workflows that use data in the first for input would fail, apparently for no reason.
We recommend documenting shared data locations
Include the name or link for the workspace where shared data are stored in the dashboard to ensure that any cloned workspaces can trace back to the original.
Users cannot make a workspace bucket public without a request to the Terra team
This means all collaborators must have reader, writer, or owner permissions on the workspace where the data are stored for access. Period.
Option 2: Sharing large datasets, multiple studies (external buckets)
When to use external cloud storage
- To share data resources that others can use without having to copy large data files to hundreds of workspaces while avoiding hosting (i.e., paying for) large data files, use an external, requester-pays bucket for the data.
- For large datasets used across multiple studies, we recommend using external Google buckets (these may be requester pays buckets).
How to allow access to external buckets
You control access to the buckets by assigning individual or group permissions. However, for external buckets, you assign permissions in the Google Cloud console instead of directly in Terra.
Using external (requester pays) buckets Keeps you from losing shared data if someone in the group inadvertently deletes the workspace
Helps keep track of data because you can name the external Google Cloud bucket storing the data, rather than using the random-string names of workspace buckets
Makes the data accessible while minimizing cost. The host pays only storage costs (people who want to download pay for that themselves)
A downside to using external buckets is that this approach may circumvent Terra's built-in security around accessing data, and put responsibility for security solely on the data owner
For help with setting up external and requester pays workspace buckets, please contact support (Support > Contact Us in the main menu dropdown at the top left of any page in Terra).
Additional resources
- Working with large amounts of data? See Configure GCS to prevent data transfer charges.
- You'll find other useful articles in the Data Submitters Resources section of Terra Support.