Working with workspaces

Anton Kovalsky
  • Updated

Workspaces are the building blocks of Terra - a dedicated space where you and your collaborators can access and organize the same data and tools and run analyses together.

This article explains how to set up and collaborate in a project workspace on Terra. 

Intro to Workspaces video

All the study components you need in a workspace

You can use a Terra workspace to keep everything for your study together in one place - data, metadata and analysis tools, as well as documentation and a record of all workflow submissions. Each distinct component has its own page (see screenshot below) which you can access by clicking the tab in the top of any page. Expand for more details about how to access the resources you need. S52a_Workspace_tabs_Screen_Shot.png

Documentation in the Dashboard

The landing page is your project overview - the questions you’re trying to answer, the data and analysis tools you'll use, etc. Good documentation makes your analysis easy to share (including with your future self) and reproduce.

Workspace-Dashboard_Screen_shot.png

Editing the dashboard (in the markdown language)
Click the pencil icon to the right of the "About the Workspace" header at the top to edit. The dashboard uses the markdown language, which lets you organize with headers and include links and additional references. 

To learn more about Best Practices for documenting in a dashboard, see this article.

Workspace information
These details are populated automatically in the right column of the Dashboard. These include the workspace creation date, date last updated, workflow submissions, what access level you have, an estimate of the monthly cost and the associated Google project ID. 

Workspace Owners

Workspace tags  
These are most useful for searching for particular workspaces in Your Workspaces

Google bucket   
This includes the unique ID of the Workspace bucket (you can copy by selecting the clipboard icon). You can also open the bucket on GCP console by clicking the "Open in browser" link. 

Store data in the workspace bucket

Each workspace comes with a bucket where data generated by a workflow analysis and notebook.ipynb files are stored by default.

G0_tip-icon.png


Storage classes

 

All Terra buckets are Standard storage class buckets. We may support NearlineColdline, and Archive storage classes in the future, but they are not available at this time.

To access the dedicated workspace bucket

1. In the Dashboard: Select the "Open in browser" link in the bottom of the right column.

2. From the Data page: Click on the "Files" icon at the bottom of the left column. 

To upload your own data (small numbers, small files)

Option 1 (clicking the dashboard link) will take you to the Google Cloud Platform (GCP) console, where you can upload smaller files from your local machine by clicking or draggingS51b_Workspaces_Google_bucket_Screen_Shot.png

Option 2 (clicking the Files icon from the Data page) will display the Workspace bucket file structure in the UI. You can upload by selecting the "+" icon at the bottom right.
Upload-files-from-the-Data-page_Screen_shot.png

For large numbers and/or large data files, you can also use gsutil in a terminal (access from the widget at the top right) to copy data from a local machine or other cloud storage. To learn more, see this article.

Manage and organize data in the Data page

Like spreadsheets built right into the workspace, data tables help keep track of all project data no matter where in the cloud the files are stored. This becomes especially useful as the number of participants or samples in your study grows.

In Terra, you can analyze data stored in the cloud without copying files to the workspace bucket by including links to the data's actual location in the cloud in the data table. Workflows can input from the table and you can even associate generated data with the original by write output metadata to the workspace table. 

Learn how to combine data from different studies or across datasets in a single table in this video.

 Analyze and visualize data in real time in a Jupyter Notebook

Need to add a notebook?
You can create or upload a notebook.pjnb file from the Notebooks tab by selecting the cards at the left of the page. Or you can copy a notebook to another workspace by clicking on the three vertical dot icon to the left of the notebook name.S51c_Wprkspaces_copy_notebooks_Screen_Shot.png

Set up and run workflows (pipelining)

You'll collect, configure (set up) and run workflows for bulk analyses from the Workflows page. Workflows are the sorts of repetitive analyses that can be automated, such as aligning sequencer reads or calling variants. You can set up and run a workflow by clicking on the workflow name in the card. Many options for saving costs - such as using call caching, checkpointing, or preemptibles - are available in-app. 

S52i_Workspaces_workflows_Screen Shot.png

Finding the workflow you need
Not a coding expert? Browse and import published workflows in Dockstore or the Broad Methods Repository by selecting the "Find a Workflow" card from the Workspaces page. 

S52j_Workspaces_suggested_workflows_Screen_Shot.png

Monitor and Troubleshoot in the Job History page

Here is where to check on the status of all current and past workflow submissions. 

Troubleshooting
You can troubleshoot failed flows by selecting the workflow name in the "Submission" column at the left. Read more about troubleshooting here

S52k_Workspaces_Job_History_Screen_Shot.png

Error logs
When troubleshooting, continue to select "View" for more details, including error logs (click on the icons at right):

S52l_Workspaces_Job_Manager_Screen_Shot.png

Collaborate in a shared workspace

To collaborate, you can "share" the project workspace with all the data, tools and generated data. Workspace owners control how much access collaborators have to resources, including funding, by assigning roles with different permission levels.

Learn more about using permissions and groups to manage shared resources in this article. 

S51d_Share_workspaces_Screen_Shot.png

Building workspaces using the Terra Library

Terra has three libraries that can help when you are building a project workspace. To access the libraries, click the main menu icon (three horizontal lines) at the top left of any page and open the "Library" submenu.  

S51e_Workspaces_libraries_Screen_Shot.png

Data Library

Terra hosts both open- and controlled-access datasets. Some datasets have built-in functionality ("Data Explorers) for browsing the data. Explorers also let you use selection criteria to create custom subsets (cohorts) of data. Note that the number of available datasets - and the amount and type of data available in each - are growing, including public-access datasets.
Registering for Terra does not automatically mean you can access all the hosted data. To access restricted data, you must be added to an Access policy for that resource. When you try to view data you don't have access to, you'll be prompted to request access.

G15a_May13_2019.gif

Terra Library datasets

Note that this list may not include all datasets currently available on Terra. 

AnVIL - CCDG, CMG, GTEx, eMERGE
The Genotype-Tissue Expression (GTEx) Program established a data resource and tissue bank to study the relationship between genetic variation and gene expression in multiple human tissues.

STAGE - TOPMedTrans-Omics for Precision Medicine (TOPMed)
Sponsored by the National Institutes of Health's National Heart, Lung, and Blood Institute (NHLBI), TOPMed is a program to generate scientific resources to enhance our understanding of fundamental biological processes that underlie heart, lung, blood, and sleep disorders (HLBS).

Nurses health Study (NHS)
The Nurses' Health Study and Nurses' Health Study II are among the largest investigations into the risk factors for major chronic diseases in women.

Human Cell Atlas (HCA)
The Human Cell Atlas is made up of comprehensive reference maps of all human cells — the fundamental units of life — as a basis for understanding fundamental human biological processes and diagnosing, monitoring, and treating disease.

The Encyclopedia Of DNA Elements (ENCODE)
 The ENCODE project aims to delineate all functional elements encoded in the human genome. To this end, ENCODE has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification.

TCGA, TARGET
The Cancer Genome Atlas (TCGA) is a dataset comprised of over two petabytes of genomic data, produced in a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI).

Using the Data Explorer

If you have permission to view a dataset, clicking "browse data" will take you to the data explorer, where you can create your own subset of data (cohort) by choosing the criteria relevant to your research. You can then export the subset to your workspace. It will appear a "cohort" table in the Data tab. 

G15b_May13_2019.gif


In the clip above, we create a unique cohort by selecting patients with a few types of cancers, and then limiting the cohort by gender. Clicking "export" and "send" takes you to the import data screen where you can select the workspaces to send your custom cohort:

S9_Jan29_2019.png

For step-by-step information on on how to use custom cohorts with SQL and BigQuery, see this article.

Showcase and Tutorials (Template Workspaces) Library

One of the best ways to get started in Terra is to explore the curated Showcase Workspaces in the Library (access from the dropdown in the main navigation menu at the top left). These workspaces span a variety of use cases and are standardized for completeness and ease of use.

  • Tutorial workspaces 
  • Specific analysis tools (i.e. WDLs, Jupyter Notebooks, Hail, Bioconductor)
  • Experimental strategies (i.e. GWAS, Exome analysis, RNA-seq)
  • Scientific domains (Cancer, infectious diseases, single-cell, immunology)
  • Give users a chance to understand their peers' experimental design

They're great as templates or to help reproduce instructive results and learn established methodologies - and you don't have to be logged into Terra to see them (though you will have to log in to make your own copy to work in!).

You should find enough detail in the workspace description to enable you analyse the included sample data. Cost and time estimates give you the confidence to run on your own data, if you want.

Featured-Workspaces_Screen_shot.png

Note that you can get to the Featured Workspaces page using the navigation menu at the top left of any screen in Terra. 

Code and Workflows Library

The Code & Workflows Library contains GATK Best Practices workflows, and links to Dockstore and the Broad Methods Repository (look in the right column). Users familiar with running workflows can use these repositories to find workflow components to run individually or to string together using Workflow Description Language (WDL).

S24_May31_2019.png

Was this article helpful?

2 out of 4 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.