A workspace is a computational sandbox where you can organize data and tools, and run analyses. Users can create, share, and clone workspaces.
- Data: can be pre-loaded or user-uploaded, open access or controlled access
- Workflows: pre-loaded or user-created
- Tools: pre-loaded or user-created
- Results: from all runs, captured with provenance
Terra strives to make collaboration as simple as possible with easy processes for "cloning" and "sharing" workspaces.
"Sharing" a workspace allows multiple people be actively working inside of the same workspace together. You can see what sorts of changes your colleagues are implementing, troubleshoot a workspace efficiently as a team, and delegate work. The main drawback of "sharing" a workspace is that you may end up making changes to someone else's work, or messing up your team's version organization. If you wish to play around in a workspace without running such risks, or even if you just want to run computations in a workspace you only have "read access" to, you can "clone" that workspace. This creates a completely independent copy of the workspace in which you are the owner and sole user (until you choose to "share" your "clone" with someone else).
To clone or share a workspace, find the three-dot menu button of that workspace, either within the workspace itself, at the top right of the screen (option 1), or on the label for the workspace when you are in the "Your Workspaces" section (option 2):
The Workspaces section is divided into five tabs:
- Job History
Below is a brief overview of the contents of each tab
A condensed view of what this workspace is for, who has access, how much it costs, and the status of the activity going on inside of it.
Some examples of what the dashboard can be useful for:
- Share and manage access to it
- Access your Google bucket
- View the authorization domain
- View the project costs
- View your workspace status
- Manage workspace attributes
- Manage information about your project in the Description
- Set tags so that you can easily find it
Under the data tab, users can interact with the data entities associated with a given workspace. This includes the following basic data entity types:
In addition, this section contains useful metadata such as the descriptive data associated with a given participant/cohort, as well as workspace attributes necessary for reproducibility.
From the notebooks tab of your workspace you can launch an interactive analysis environment based on Jupyter (formerly IPython) notebooks, Spark, and Hail. Jupyter notebooks are becoming an increasingly popular way of creating reproducible bioinformatics analysis tasks. They combine familiar and powerful programming languages, like R and Python, with the ability to create and share documents containing code, results, and narrative text.
Notebooks function by requisitioning CPUs from the cloud to create a virtual machine called a "cluster". Cluster computational runtime typically costs about 20 cents per hour. If requisitioning more computational power, the cost increases.
The exact cluster parameters, cost-per-hour, and current status (starting, running, stopping, stopped) can be seen in the top right corner of the screen when inside of a specific Workspace. If a window with a notebook remains inactive for a long enough time, the cluster will automatically stop. Starting a stopped cluster may take up to a few minutes.
The Tools tab is where you organize, configure, and launch workflows. Workflows are comprised of one or more tasks, input and output parameters. Workflows can be viewed and configured by going to the Tools tab of the Workspaces section and then clicking on one of the available workflows. Within this view, there are also four subsections:
- Script - The raw code that uses Workflow Description Language (WDL) to wrap and interconnect a set of tasks
- Inputs - The table of input parameters that described the method configuration
- Outputs - This is where the outputs expected from running a workflow are configured
- Run Analysis - This is where you launch a batch analysis, once configure
This is the section where users can monitor the progress and status of batch analyses they have submitted for computation. You can also use the Job History tab to abort and relaunch workflows, as shown in the clip below. The workflows can be in the following states:
- Queued : Each workflow is initially queued pending processing by the Cromwell execution engine. Note that each user is permitted 1000 active (either running or aborting) workflows at a time; if you submit more workflows than that number they will be queued until some of your previous workflows have become inactive (either through failure or completion).
- Launching : The recently launched workflow is being sent to Cromwell.
- Submitted : The workflow has been received by the Cromwell execution engine and an associated workflow id has been generated.
- Running : The workflow is being actively processed by Cromwell.
- Aborting : Cromwell has processed the request to abort this workflow but the abort is not complete.
- Aborted : The workflow has halted completely due to an abort request.
- Succeeded : The workflow has successfully completed.
- Failed : The workflow has terminated abnormally due to some error.
Currently, viewing job details by clicking the job name redirects to FireCloud's "Monitor" section, where the results can be downloaded, but full integration of these functions into the Terra interface is coming soon.