Recommendations for sharing a workspace with a broad audience

John Bates
  • Updated

 

Terra workspaces do more than aid in collaboration during an analysis or study - they also showcase your work and provide a starting place for other research. You've completed your analyses, polished your notebooks, made informative plots and figures, and written a compelling workspace description. Now all that remains is to make the workspace available to a wider audience (or in support of a publication.)

Content for this article was contributed by John Bates and Matt Bookman from Verily Life Sciences, and Allie Cliffe from Terra User Education at the Broad Institute.

Sharing your Terra workspace

Reasons to share your workspace broadly

  • To get feedback from collaborators or the larger Terra community
  • To share "getting started" material for working with a new dataset
  • To share a newly developed workflow or analysis method with the community
  • To support a research article or other publication

A Terra workspace can include data (stored in a dedicated workspace bucket) and metadata, analysis tools (i.e., workflows from Dockstore or the Broad Methods Repository, Jupyter notebooks stored on Terra), and documentation. When sharing a Terra workspace, you should consider what gets shared when you share a workspace.

Recommendation: Align workspace access with the dependent data access

  • A public workspace with public data
  • A controlled-access workspace with controlled-access data

There is a more complex case of creating a public workspace on controlled-access data. This scenario is out of scope for this document, but we look forward to exploring it in a future article.

Public workspace with public data

If the data files underlying your workspace are publicly available without requiring special authorization, then a public workspace will reach the widest audience. Here, the workspace and associated data are publicly accessible to all Terra users. This can be the case if an analysis is based on datasets that don't require authorization for access, like IGSR (1000 genomes) or GenBank.  Alternatively, you might include a sample set of your own data that is not access controlled.

In a public workspace, all Terra users can

  • Read data in the workspace bucket and data tables
  • >Read notebooks, including notebook outputs, in preview mode
  • Read all workflows, regardless of whether or not they're publicly available elsewhere
  • Clone the workspace and run analyses in their own copy on the primary data

See the "How to make a workspace public?" support article for details.

Sharing more broadly with Featured Workspaces

Some great examples of publicly shared workspaces are the featured workspaces that showcase Terra's capabilities as a research platform. Many of these use publicly available data or open-access sample data. 

If your workspace meets or can be adapted to meet the criteria outlined in the Feature Your Workspace support article, please consider featuring your workspace!

Recommendations and caveats

  • If the data you control are in a Cloud Storage bucket outside of Terra, make sure the files in the bucket are world-readable.
  • Make sure you're comfortable sharing all workspace elements described above with all Terra users! This includes data you control, data generated by your notebooks and workflows, as well as all workflows and notebooks
  • Make sure your workflows are publicly available on Dockstore or the Broad Methods Repository, otherwise people won't be able to run them directly in their clones of your workspace.  However, if you have intentionally made your workflows private, keep in mind that anyone who has access to the workspace will be able to see (and copy) the WDL in the workflow configuration form. 
  • Consider supplying precomputed results (or downsampled data) for long-running workflows and notebook-based analyses so other users don't have to run the full analyses to visualize and interact with their results.
  • Lock your workspace to prevent collaborators and viewers from changing the data in the workspace once it is published. See Featured Workspace Requirements: lock your workspace for more information.

Controlled access workspace with controlled access data

You might want to share your workspace with other researchers or the community but aren't able to share all aspects of the workspace publicly.  The data might be controlled-access, or the analyses in the workflows or notebooks might be proprietary.  In cases like these, you'll want to create a controlled-access workspace and share it as read-only with a select group of authorized users.

In a controlled-access workspace, only authorized users can

  • Read data in the workspace bucket and workspace tables.
  • Read notebooks, including notebook outputs, in preview mode.
  • Read all workflows, regardless of whether or not they're publicly available elsewhere.
  • Clone the workspace and run analyses in their own copy on the primary data.

To share a controlled access workspace, we recommend you use an authorization domain. Authorization domain protection follows your workspace even when it's shared or cloned, helping to prevent unauthorized access to your workspace and data.

Recommendations and caveats

  • If the data you control are in a Terra workspace without an authorization domain, we recommend creating a new workspace with an authorization domain and moving your notebooks, workflows, and data to the new workspace.  Remember: when you clone a workspace, data in the workspace bucket is not copied to the new workspace - you'll need to manually copy them to the new workspace bucket.  Workspace data table metadata is copied to the new workspace.
  • If the data you control are in external Google Cloud Platform resources like Cloud Storage buckets or BigQuery datasets, we recommend restricting access to these resources to members of the Terra group associated with your authorization domain.
  • Make sure you're comfortable sharing all workspace elements as described above with all users you've shared the workspace with! This includes data you control, data generated by your notebooks and workflows, and all workflows and notebooks. Note that even if your workflow is not publicly available on Dockstore or the Broad Methods Repository, anyone who has access to the workspace will be able to see (and copy) the WDL in the workflow configuration form.
  • Consider supplying precomputed results (or downsampled data) for long-running workflows and notebook-based analyses so other users don't have to run the full analyses to visualize and interact with their results.

Examples

  • Genetic Association of Albuminuria with Cardiometabolic Disease and Blood Pressure (Workspace, Paper)
  • Cumulus: a cloud-based data analysis framework for large-scale single-cell and single-nucleus RNA-seq (Workspace, Paper)

Workspace and data-sharing options

For other, more complex options, contact Terra support (support@terra.bio) for help.

Minimizing data egress charges

Enable Requester Pays on data buckets

Whether or not you choose to make the data used in your analysis publicly available, we recommend enabling Requester Pays billing on Google Cloud Storage buckets used by your workspace to avoid potential data egress charges from other users.

You can enable requester pays access on your workspace bucket. Please contact Terra support (support@terra.bio) for help.

To learn more, see Configure Google Cloud Storage to avoid egress charges

Include data location information

We recommend including data location information in the workspace descriptions so that workspace consumers can make informed choices about where to locate their associated storage and compute.

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.