How to feature your workspace

Beri
  • Updated

Learn best practices and step-by-step instructions for creating a featured workspace that Terra users can clone and use.

If you are looking for the conceptual background behind a featured workspace, see Overview: Feature your workspace instead.

Featured workspace requirements

These requirements are intended to ensure users have the best experience cloning and using the workspace analysis functionality (workflows or interactive analysis). 

1. Include all featured Workspace components

All Featured Workspaces should include the following (where applicable).

Dashboard documentation

Documentation should be clear enough so that users can run the analysis on their own.

Documentation should follow the Featured-Workspace-Template.

WDL/JSON - workflow analysis component (if applicable)

All relevant workflows should be imported to the workspace and ready to execute, with all attributes preconfigured.

Multiple workflows that need to be run sequentially should be numbered in the sequence in which the workflows need to be run with #-name. Example: “1-workflow, 2-workflow".

Workflows can be stored in Git, Dockstore, or Terra Method Repository. 

Dashboard documentation should include adequate description of each workflow, explaining what it does, what input it accepts, and the expected output.

While not required, the cost and time to run each workflow on example datasets (included)  is strongly recommended.  

References/Resources - to run the analysis

Reference and Resources should be listed in the Workspace Data Table under the Data tab.

All files need to be publicly accessible with consent for public access.

Ensure compatibility with input data. For example, if input BAMs are aligned to hg38, the reference should be hg38.

Jupyter Notebook - interactive analysis component (if applicable)

Each code cell should be ready to execute without user intervention.

The dashboard should include an adequate description of what each notebook does, the input it accepts, and the expected output.

The dashboard should include the recommended Cloud Environment, any required packages and the minimum compute resources to run the notebook on sample data.

Sample input data - for workflow or interactive (i.e., notebook) analysis

Confirm all data have consent for public access.

\Upload to a publicly accessible, external Google bucket, separate from the Workspace bucket. This will ensure the original path to the data in the cloned version of the workspaces is still functional and available. Note: Workspace buckets are not public, even if the workspace is, and data  in the original workspace bucket are not copied to cloned copies of the workspace). Using a separate Google bucket also has the benefit of enabling requester pays, where the requester (and not the data owner) pays egress fees on downloaded data.

Docker images- used in WDL workflows or custom notebook environment

  • Make docker images publicly accessible.

Workspace tags

  • Add tags to the workspace so they can be properly placed in the correct showcase categories.

Filters/categories

Tag examples

Analysis tools

WDLs, Jupyter Notebooks, RStudio, Galaxy, Hail, Bioconductor, GATK, Cumulus, Spark

Experimental strategy

GWAS, Exome Analysis, Whole Genome Analysis, Fusion Transcript Detection, RNA Analysis, Machine Learning, Variant Discovery, Epigenomics, DNA Methylation, Copy Number Variation, Structural Variation, Functional Annotation

Data generation technology

10x Analysis, Bisulfite Sequencing

Scientific domain

Cancer, Infectious Diseases, MPG, Single-cell, Immunology

Datasets

AnVIL, CMG, CCDG, TopMed, HCA, TARGET, ENCODE, BioData Catalyst, TCGA, 1000 Genomes, BRAIN Initiative, gnomAD, NCI, COVID-19

Utilities

Format Conversion, Developer Tools

Projects

HCA, AnVIL, BRAIN Initiative, BioData Catalyst, NCI

2. Test your analysis tools

Workflows and notebooks must run successfully and generate valid results without human intervention (i.e., no renaming of variables, ordered workflows), and do what the dashboard documentation specifies.

Testing suggestion

  • Have someone completely new to the workspace test it and provide usability feedback.
  • Terra is routinely updated, so we ask owners of the workspace to regularly test their workflows and notebooks to confirm all scripts run as expected.

3. Lock your workspace

Locking a workspace prevents collaborators (or any writers, in a public workspace!) from modifying anything in that workspace. This is useful if you are showcasing a workspace and don't want any content deleted or modified. 

You can lock your workspace by clicking the three vertical dot share icon and selecting the Lock workspace option in the dropdown menu. Screen_Shot_2022-08-23_at_11.57.50_AM.png

4. Ready for featuring?

Once the workspace has been tested and is operating satisfactorily, you can request for the Frontline support team to feature your workspace. Frontline will confirm with the workspace owners via email instructions before posting.

Preventing network egress charges

Network egress charges can be incurred whenever data leave a Google Cloud region, such as copying data from a  bucket or copying a Docker image from Container Registry or Artifact Registry. This can occur for copies to a VM in a different compute region or copies out of cloud (downloads). For more details, see Google's network pricing documentation.

Network egress charges by default go to the Cloud Storage bucket (workspace) or Docker image owner. For data in Cloud Storage, using the Requester Pays option passes charges to data users.

To avoid unexpected network egress charges in your Featured Workspace, we recommend the steps below.

Step 1. Dashboard recommendations

Publish relevant Google Cloud location information in the workspace dashboard page (see suggested language below).

  • "Example data for this workspace is in <bucket region> and the bucket is <requester pays or not>."
  • "Reference data for this workspace is in <bucket region> and the bucket is <requester pays or not>."
  • "The Docker image for this workspace is published in <image location>."

Step 2. Reference or sample data storage

To prevent egress charges to the bucket owner or reference or example data in Cloud Storage, you can set up controls around the featured workspaces' project data - like a VPC-SC security perimeter or a requester pays bucket.

See Configure GCS to prevent egress charges for more details. 

For instructions on how to turn on the "requester pays" option on an external GCS bucket, see the Google documentation.  

Step 3. Docker images recommendations

  • Don't grant broad access to images in Google Container Registry or Artifact Registry (go ahead and use it internally and grant access to trusted users). Out-of-region workspace users who download Docker images from these registries will generate data egress charges for the owner of the Docker image (see this Cromwell github issue for context).
  • Alternatively, use a different Docker registry for public access to your images and update your public WDL workflows to use this registry.

To learn more, please see Docker Image Publishers Tips or Configure GCR/Artifact Registry to prevent egress charges for details.

Additional Featured Workspace Resources

Want to create a featured workspace but having a hard time getting started? Use this smartsheet project plan that contains several tasks normally involved in creating and featuring a workspace as a guide: Workspace Featuring Project Plan.

Already have a workspace featured and need us to archive and/or replace the workspace? Fill out the maintenance form:  Workspace Maintenance Intake Form.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.