Have a well crafted workspace you'd like to share with the Terra community? Sign up to feature your workspace!
Featuring a workspace is a great opportunity for you to broadcast your work or discover workspaces with ready-to-run WDL workflows and interactive Jupyter notebooks that can be repurposed for your own research.
The Terra Library includes a curated selection of Showcases and Tutorials, which features public workspaces that highlight some of the community's finest reproducible (mostly) workspaces. Designed with collaboration in mind, these workspaces include well-documented dashboards that describe the contents of the work being performed. The idea is to enable you to easily reproduce other’s work, using public data, and scripts preconfigured with the correct attributes so that each workflow and/or notebook is ready to run.
Workspaces generally come in three flavors
Analysis-focused workspaces (e.g. ToF)
Generally associated with a pre- or post- publication stud, these highlight biological meaning/implication. The workspace will have a thorough description of the study, general motivation for the scientist/analyst’s experiment, caveats and concerns, and an ordered list of the case study steps. The workspace is a reproduction of the publication, with all the analysis tools, on the Terra platform.
Data-focused workspaces (eg Target, TCGA etc)
These workspaces focus on introducing users to specific public- or restricted-access datasets available in Terra’s Data Library. They bundle instructions for accessing and working with various cohorts and data types (per project) in the dashboard and include example workflows or notebooks that reproduce a typical analysis of the dataset. If data access is restricted, the workspace must include easy-to-follow instructions on how to gain access to the data and import the data to the workspace to run the workflows.
Workflow-focused workspaces (e.g. GATK workspaces)
These workspaces contain WDL workflows and/or notebooks with sample test data sufficiently small to be run in a reasonable time for a small cost. The dashboard of this workspace at minimum has light documentation on the pipelines to describe the purpose, requirements, input and output of each. Both workflows and notebooks should be preconfigured and ready to execute, with sufficient instructions that a user new to the platform but familiar with the science is able to run the scripts. Is there are multiple workflows that will be run back to back, they need to be named with numerical prefixes and configured to run seamlessly (for automated testing purposes). Workflows should be regularly updated to follow tool versions/evolutions.
If you have a workspace that fits these categories, or something different but similarly well-crafted, please sign up to have it featured! Fill out the form below and our team will contact you to begin the process:
What to expect
After submitting the form we'll review your workspace to see if it meets our FW requirements (listed below). If everything checks out, we feature the workspace, if not we'll provide suggestions on meeting the requirements. Note that to maintain a consistent tone on the Terra platform, we may make small editing changes to the documentation (both in th edashboard and the notebook). We will ask you for final approval before featuring the workspace.
These requirements are intended to ensure that users have the best experience cloning and making use of the workspace batch analysis functionality (workflows) or its interactive analysis capability (Jupyter notebooks).
Featured Workspace components
- Billing project to create the workspace and workspace name. Note: The billing project and name are both present in the workspace title shown in Terra - so name wisely. Ex: help-gatk/Germline-SNPs-Indels-GATK4-hg38 (“help-gatk” is the billing project)
- WDL/JSON - workflow (batch) analysis component (if applicable)
- Must be uploaded to a designated git repository owned by the collaborator. It can then be pulled into Dockstore and exported to a Terra workspace. If a repo isn’t available, the WDL and json can be uploaded to the Terra Method Repository.
- A ReadMe text file describing the contents of the git repo (if applicable).
- All workflows should be imported to the designated workspace with all the attributes pre configured and ready to execute.
- Workspaces containing multiple workflows that need to be run sequentially should have the name include the sequence in which the workflows need to be run with #-name. Example: “1-workflow, 2-workflow”
- Jupyter Notebook - interactive analysis component (if applicable)
- Should be uploaded to a designated git repository owned by the collaborator. If this isn’t available, it can be uploaded to the workspace Google bucket.
- Each cell should be ready to execute, it shouldn’t require user intervention.
- Sample input data - for workflow (i.e. batch) or interactive (i.e. Notebook) analysis
- Should be uploaded to a publicly-accessible Google bucket, separate from the workspace. This is to ensure that anyone can access the data (it is not possible to make a workspace bucket public) and, if any accidents happen, the data are not deleted. This can be placed in a requester pays bucket to reduce costs.
- Confirm that data is consented for public access.
- References/Resources - to run the analysis
- Metadata in the workspace Data Table should include paths to publically-accessible resources that should be used.
- Ensure compatibility with input data. For example, if input BAMs are aligned to hg38, the reference should be hg38.
- Docker images - to run the batch analysis
- Should be publicly accessible
- Dashboard documentation per Featured-Workspace-Template
The components above must be able to run successfully with valid results without human intervention (i.e. no renaming of variables, ordered workflows), and do what the dashboard documentation instructs. Suggestion: have someone completely new to the workspace test it and provide usability feedback. Furthermore, Terra is routinely being updated so we ask owners of the workspace to regularly test their workflows and notebooks to confirm all scripts run as expected.
Ready for Featuring
The Frontline Support team has the ability to “feature” the workspace and will do so once the workspace has been tested and is operating to the collaborator’s and support lead’s satisfaction. This will be confirmed before posting.
Want to create your own workspace but having a hard time getting started? Use this smartsheet project plan that contains several tasks normally involved in creating and featuring a workspace as a guide.
Already have a workspace featured and need us to archive and/or replace the workspace? Fill out the maintenance form.