Recommendations for sharing a workspace with a broad audience

John Bates
  • Updated
Content for this article was contributed by John Bates and Matt Bookman from Verily Life Sciences, and Allie Cliffe from Terra User Education at the Broad Institute.

Overview

Terra workspaces, in addition to being a way to collaborate during an analysis or study, can be for showcasing your work and a starting place for other research. You've completed your analyses, polished your notebooks, made informative plots and figures, and written a compelling workspace description. Now all that remains is to make the workspace available to a wider audience (or in support of a publication.)

Reasons to share you workspace broadly:

  • To get feedback from collaborators or the larger Terra community.
  • To share "getting started" material for working with a new dataset.
  • To share a newly-developed workflow or analysis method with the community.
  • To support a research article or other publication.

In this article, we describe a few ways you can share your Terra workspace, with recommendations and caveats for each approach.

Sharing your Terra Workspace

A Terra workspace can include data (stored in a dedicated workspace bucket) and metadata, analysis tools (i.e. workflows from either Dockswtore or the Broad Methods Repository, Jupyter notebooks stored on Terra)  and documentation. When sharing a Terra workspace, you will want to consider what gets shared when you share a workspace.

We recommend aligning the workspace access with the dependent data access:

  • A public workspace with public data
  • A controlled access workspace with controlled access data

There is a more complex case of creating a public workspace on controlled-access data. This scenario is not addressed in this document, but we look forward to exploring it in a future article.

Public Workspace with Public Data

If the data underlying your workspace is publicly available without requiring special authorization, then a public workspace will reach the widest audience. Here, the workspace and associated data are publicly accessible to all Terra users. This can be the case if an analysis is based on datasets that don't require authorization for access, like IGSR (1000 genomes) or GenBank.  Alternatively, you might include a sample set of your own data that is not access controlled.

Some great examples of this are the featured workspaces that showcase Terra's capabilities as a research platform. Many of these use publicly available data or open-access sample data.  To learn more, see the Feature Your Workspace support article. If your workspace meets or can be adapted to meet the criteria outlined in the linked article, please consider featuring your workspace!

In a public workspace, all Terra users can:

  • Read data in the workspace bucket and data tables
  • Read notebooks, including notebook outputs, in preview mode
  • Read all workflows, regardless of whether or not they're publicly available elsewhere
  • Clone the workspace and run analyses in their own copy on the primary data

See the "How to make a workspace public?" support article for details.

Recommendations and Caveats

  • If the data you control are in a Cloud Storage bucket outside of Terra, make sure the files in the bucket are world-readable.
  • Make sure you're comfortable sharing all workspace elements described above with all Terra users! This includes data you control, data generated by your notebooks and workflows, as well as all workflows and notebooks
  • Make sure your workflows are publicly available on Dockstore or the Broad Methods Repository, otherwise people won't be able to run them directly in their clones of your workspace.  However, if you have intentionally made your workflows private, keep in mind that anyone who has access to the workspace will be able to see (and copy) the WDL in the workflow configuration form. 
  • Consider supplying precomputed results (or downsampled data) for long-running workflows and notebook-based analyses so other users don't have to run the full analyses to visualize and interact with their results.

Controlled Access Workspace with Controlled Access Data

You might want to share your workspace with other researchers or the community, but aren't able to share all aspects of the workspace publicly.  The data might be controlled access, or the analyses in the workflows or notebooks might be proprietary.  In cases like these, you'll want to create a controlled access workspace, and share it "read only" with a group of authorized users.

In a controlled-access workspace, only authorized users can:

  • Read data in the workspace bucket and workspace tables.
  • Read notebooks, including notebook outputs, in preview mode.
  • Read all workflows, regardless of whether or not they're publicly available elsewhere.
  • Clone the workspace and run analyses in their own copy on the primary data.

To share a controlled access workspace, we recommend you use an authorization domain. Authorization domain protection follows your workspace even when it's shared or cloned, helping to prevent unauthorized access to your workspace and data.

Recommendations and Caveats

  • If the data you control are in a Terra workspace without an authorization domain, we recommend creating a new workspace with an authorization domain and moving your notebooks, workflows, and data to the new workspace.  Remember: when you clone a workspace, data in the workspace bucket are not copied to the new workspace - you'll need to manually copy them to the new workspace bucket.  Workspace data table metadata are copied to the new workspace.
  • If the data you control are in external Google Cloud Platform resources like Cloud Storage buckets or BigQuery datasets, we recommend restricting access to these resources to members of the Terra group associated with your authorization domain.
  • Make sure you're comfortable sharing all workspace elements as described above with all users you've shared the workspace with! This includes data you control, data generated by your notebooks and workflows, and all workflows and notebooks. Note that even if your workflow is not publicly available on Dockstore or the Broad Methods Repository, anyone who has access to the workspace will be able to see (and copy) the WDL in the workflow configuration form.
  • Consider supplying precomputed results (or downsampled data) for long-running workflows and notebook-based analyses so other users don't have to run the full analyses to visualize and interact with their results.

Examples

  • Genetic Association of Albuminuria with Cardiometabolic Disease and Blood Pressure (Workspace, Paper)
  • Cumulus: a cloud-based data analysis framework for large-scale single-cell and single-nucleus RNA-seq (Workspace, Paper)

Other Workspace and Data Sharing Options

For other, more complex options, contact Terra support (support@terra.bio) for help.

Minimizing Data Egress Charges

Regardless of whether you choose to make the data used in your analysis publicly available, we recommend enabling requester pays billing on Cloud Storage buckets used by your workspace to avoid potential data egress charges from other users. To learn more, see Configure Google Cloud Storage to avoid egress charges

We also recommend including data location information in the workspace descriptions so that consumers of the workspace can make informed choices about where to locate their associated storage and compute.

It is also possible to enable requester pays access on your workspace bucket.  Please contact Terra support (support@terra.bio) for help.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.