A workspace is a computational sandbox where you can organize data and tools, and run analyses, with all the pieces you need for a research project in one place. Processes for "cloning" and "sharing" workspaces enable treamlind sharing and collaborating, and built-in security and authentication ensures your data and analysis tools maintain the right level of privacy. This article gives an overview of a Terra workspace, as well as how to share and collaborate.
What's in a workspace
- Data: Can be pre-loaded (i.e. hosted in the Terra library) or user-uploaded, open- or controlled-access. Data generated by analysis is stored in the workspace Google bucket by default. Analysis workflows and notebooks can be configured to output metadata to the workspace data table.
- Analysis Workflows: For pre-processing or processing genomics data. Workflows can be pre-loaded, imported from Dockstore or the Broad Methods Repository, or user-created
- Interactive analysis (visualization and statistics) Jupyter notebooks: These can be pre-loaded or user-created
- Results: Provenance from all pipeline runs done in the workspace is automatically stored in the workspace
"Sharing" a workspace allows collaborators to actively work inside the same workspace at the same time. You can see what sorts of changes your colleagues are implementing, troubleshoot a workspace efficiently as a team, and delegate work. The main drawback of "sharing" a workspace is that collaborators may end up making changes to someone else's work, or upsetting your team's version organization.
To play around in a workspace without running such risks, or to run computations in a workspace you only have "read access" to, you can clone that workspace. This creates a completely independent copy of the workspace with you as the owner and sole user (until you choose to "share" your clone with someone else).
How to clone or share a workspace
Find the three-dot menu button of the workspace, either within the workspace itself, at the top right of the screen (option 1), or on the label for the workspace when you are in the "Your Workspaces" section (option 2):
The Workspaces section is divided into five tabs:
- Job History
Below is a brief overview of the contents of each tab
Documentation in the dashboard tab
The dashboard tab is a summary of what this workspace is for, details about the data, analysis workflows and interactive notebooks in the workspace, who has access, how much it costs, and the status of the activity going on inside of it.
How you might use the dashboard:
- Share and manage access to the workspace
- Access your Google bucket (on the right side column)
- View the authorization domain
- View the project costs
- View your workspace status
- Manage workspace attributes
- Manage information about your project in the Description
- Set tags so that you can easily find the workspace
Link data in the cloud to your workspace data tab
Information about the data associated with a given workspace is in the data tab. Data tables are a way to organize metadata that analysis tools will use (both workflows and notebooks). Data tables can include metadata on samples, participants and sets.In addition, the data tables contains useful metadata such as the descriptive data associated with a given participant/cohort, as well as workspace attributes necessary for reproducibility.
See more detail on linking data to your workspace data table in this article.
Interactive analysis (visualization and statistics) in notebooks
From the notebooks tab of your workspace you can launch an interactive analysis in a Jupyter (formerly IPython) notebook, using R, Python, Spark, and Hail. Jupyter notebooks are a popular way of creating reproducible bioinformatics analysis tasks. They combine familiar and powerful programming languages with the ability to create and share documents containing code, results, and narrative text. See more documentation about Jupyter notebooks in the Jupyter notebooks 101 workspace.
Notebooks function by requisitioning CPUs from the cloud to create a virtual machine called a "runtime environment". Default computational runtimes typically costs about 20 cents per hour. If you requisition more computational power, the cost increases.
You can see the exact runtime parameters, cost-per-hour, and current status (starting, running, stopping, stopped) in the top right corner of the screen when inside of a specific Workspace. If a window with a notebook remains inactive for a long enough time, the cluster will automatically stop, unless the kernel is active. Starting a stopped cluster may take up to a few minutes.
Workflows for processing genomics data
You can organize, configure, and launch workflows to process genomics data within the workflows tab. Workflows include one or more tasks as well as input and output parameters. To view and configure workflows by going to the Workflows tab and clicking on one of the available workflows cards. Within the card you can look at/adjust:
- Script - Displays the raw code that uses Workflow Description Language (WDL) to wrap and interconnect a set of tasks. Note that you cannot currently edit a WDL in Terra.
- Input parameters
- Outputs expected from running a workflow
- Run Analysis - This is where you launch a batch analysis, once configured
Monitoring workflow submissions in the Job History tab
Here you can monitor the progress and status of batch analysis workflows you've submitted for computation. You can also abort and relaunch workflows, as shown in the clip below. Workflow states and what they mean are below:
- Queued : Each workflow is initially queued pending processing by the Cromwell execution engine. Note that each user is permitted 1000 active (either running or aborting) workflows at a time; if you submit more workflows than that number they will be queued until some of your previous workflows have become inactive (either through failure or completion).
- Launching : The recently launched workflow is being sent to Cromwell
- Submitted : The workflow has been received by the Cromwell execution engine and an associated workflow id has been generated
- Running : The workflow is being actively processed by Cromwell
- Aborting : Cromwell has processed the request to abort this workflow but the abort is not complete
- Aborted : The workflow has halted completely due to an abort request
- Succeeded : The workflow has successfully completed
- Failed : The workflow has terminated abnormally due to some error
Clicking on the submission on the left will allow you to access more details, especially useful for troubleshooting:
See this document for more troubleshooting tips and tricks.