Working with workspaces
FollowWorkspaces are the building blocks of Terra - a dedicated space where you and your collaborators can access and organize the same data and tools and run analyses together.
How do you fill your workspace with the right data and analysis tools for your analysis? Run and troubleshoot analyses? Share and collaborate securely and seamlessly? This article explains the features and functions of your workspace and how to build the workspace you need with resources in Terra's Libraries.
Contents
How to pull project pieces together in a workspace
- Document project details
- Store data in the workspace bucket
- Manage and organize data
- Analyze in real time
- Set up and run pipelining workflows
- Monitor and troubleshoot workflows
- Share the workspace with collaborators
Building workspaces using the Terra Libraries
- Data Library
- Showcase and Tutorial workspaces
- Code & Workflows Library
Intro to Workspaces video
How to pull project pieces together in a workspace
You can use a Terra workspace to keep all the components of your project in one place- data, metadata, and analysis tools as well as documentation and a record of all workflow submissions. Each distinct component has its own section in the workspace (see screenshot below). Expand for more details about how to access the resources you need.
Documentation in the Dashboard
The landing page is your project overview - the questions you’re trying to answer, the data and analysis tools you'll use, etc. Good documentation makes your analysis easy to share (including with your future self) and reproduce.
Editing the dashboard (in the markdown language)
Click the pencil icon to the right of the "About the Workspace" header at the top to edit. The dashboard uses the markdown language, which lets you organize with headers and include links and additional references.
To learn more about Best Practices for documenting in a dashboard, see this article.
Workspace information
These details are populated automatically in the right column of the Dashboard:
Store data in the workspace bucket
Each workspace comes with a bucket where data generated by a workflow analysis and notebook.pnjb files are stored by default. You can access the dedicated workspace bucket from the dashboard by clicking on the "Open in browser" link in the bottom of the right column.
Uploading your own data
This will take you to the Google Cloud Platform (GCP) console, where you can upload smaller files from your local machine by clicking or dragging:
For large numbers and/or large data files, you can also use gsutil in a terminal (access from the widget at the top right) to copy data from a local machine or other cloud storage. To learn more, see this article.`
Manage and organize data in the Data page
Like spreadsheets built right into the workspace, data tables help keep track of all project data - so you don't have to remember where in the cloud all those files are. This becomes especially useful as the number of participants or samples in your study grows.
In Terra, you don't need to store input data in the workspace bucket. By including links to the data's actual location in the cloud, the data table can link data files to workspace tools. Workflows can input from the table and you can even associate generated data with the original by write output metadata to the workspace table
Combine data from different studies or across datasets into a single table.
Analyze and visualize data in real time in a Notebook
Need to add a notebook?
You can create or upload a notebook.pjnb file from the Notebooks tab. Or copy a notebook from another workspace by clicking on the three vertical dot icon to the left of the notebook name:
Set up and run pipelining workflows
Collect, configure (set up) and run workflows for bulk analyses from the Workflows tab. These are the sorts of repetitive analyses that can be automated, such as aligning sequencer reads or calling variants. You can set up and run workflows by clicking on the workflow name. Many options for saving costs - such as using call caching or preemptibles - are available in-app.
Finding the workflow you need
Not a coding expert? Browse and import published workflows in Dockstore or the Broad Methods Repository by selecting the "Find a Workflow" option:
Monitor and Troubleshoot in the Job History page
Here is where to check on the status of workflow submissions. The Job History tab includes a record of every workflow submission.
Troubleshooting
You can troubleshoot failed flows by selecting the workflow name in the "Submission" column:
Error logs
When troubleshooting, continue to select "View" for more details, including error logs (click on the icons at right):
Share and collaborate with a workspace
It's easy to collaborate and share by sharing the project workspace that contains all the data, tools and analysis work done. Workspace owners control how much access collaborators have to the resources, including funding, by assigning roles with different permission levels.
Building workspaces using the Terra Library
Terra has three libraries to help build a project workspace. To access the libraries, click the main menu icon (three horizontal lines) at the top left of any page and open the "Library" submenu.
Data Library
Terra hosts both open- and controlled-access datasets. Select datasets have built-in functionality ("Data Explorers) for browsing the data. Explorers also let you use selection criteria to create custom subsets (cohorts) of data. Note that the numbers of available datasets - and the amount and type of data available in each - are growing, including public-access datasets.
Available Datasets
AnVIL - CCDG, CMG, GTEx, eMERGE
The Genotype-Tissue Expression (GTEx) Program established a data resource and tissue bank to study the relationship between genetic variation and gene expression in multiple human tissues.
STAGE - TOPMedTrans-Omics for Precision Medicine (TOPMed)
Sponsored by the National Institutes of Health's National Heart, Lung, and Blood Institute (NHLBI), TOPMed is a program to generate scientific resources to enhance our understanding of fundamental biological processes that underlie heart, lung, blood, and sleep disorders (HLBS).
Nurses health Study (NHS)
The Nurses' Health Study and Nurses' Health Study II are among the largest investigations into the risk factors for major chronic diseases in women.
Human Cell Atlas (HCA)
The Human Cell Atlas is made up of comprehensive reference maps of all human cells — the fundamental units of life — as a basis for understanding fundamental human biological processes and diagnosing, monitoring, and treating disease.
The Encyclopedia Of DNA Elements (ENCODE)
The ENCODE project aims to delineate all functional elements encoded in the human genome. To this end, ENCODE has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification.
TCGA, TARGET
The Cancer Genome Atlas (TCGA) is a dataset comprised of over two petabytes of genomic data, produced in a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI).
Using the Data Explorer
If you have permission to view a dataset, clicking "browse data" will take you to the data explorer, where you can create your own subset of data (cohort) by choosing the criteria relevant to your research. You can then export the subset to your workspace. It will appear a "cohort" table in the Data tab.
In the clip above, we create a unique cohort by selecting patients with a few types of cancers, and then limiting the cohort by gender. Clicking "export" and "send" takes you to the import data screen where you can select the workspaces to send your custom cohort:
For step-by-step information on on how to use custom cohorts with SQL and BigQuery, see this article.
Showcase and Tutorials (Template Workspaces) Library
For a variety of curated, publicly-available example workspaces, try the Showcase and Tutorials section in the Library. These can help understand how to reproduce instructive results and learn established methodologies
Example (GATK) workspaces
- Reproducible examples of GATK workflows and tools for general use
- Many contain tools developed at and supported by Broad
Featured workspaces
- Tutorial workspaces
- Specific use-cases based on published work
- Give users a chance to understand their peers' experimental design
Code and Workflows Library
The Code & Workflows Library contains tools and tasks that make up components of workflows. Users familiar with running workflows can use this repository to find workflow components to run individually or to string together using Workflow Description Language (WDL).
Comments
0 comments
Please sign in to leave a comment.