Overview: Terra on Azure

Allie Cliffe
  • Updated

This article summarizes the cloud components you'll use in Terra and how working in the cloud differs from working locally.

This is a living document. Check back here to see the current state of Terra on Azure.

Terra on Azure is a public preview release intended to allow users early access to tools and resources on Terra. Your candid feedback will help us improve the Terra experience as we develop and roll out additional functionality.

Overview: Terra on Azure

Terra is a cloud-native platform for storing and analyzing biomedical data whose mission is “to help accelerate research by integrating data, analysis tools, and built-in security components to deliver frictionless research flows from data to results.” This release of Terra uses Microsoft Azure’s cloud infrastructure for data analyses and storage. 

Preview disclaimersSince this is a preview environment, features may change without notice. Also, note that we cannot guarantee you will not lose data.

The vision for Terra on Azure

Project data and tools - together in a Terra workspace

Whether you're interested in running pipelines or a statistical analysis, or visualizing your data, you can access and manage the tools and data you need in a Terra workspace dedicated to your project.

Workspaces function like a (very powerful) desktop computer, except the working parts are all in the cloud, and you operate it from your browser.

Browser-based and cloud-native

  • Streamline your work by consolidating resources in a central, shareable place
  • Access data stored in different cloud locations in a single analysis 
  • Collaborate seamlessly with built-in security and access controls

Terra on Azure: Functional upgrades

  • Improved performance and scalability
  • Easier to integrate new analysis capabilities (such as upcoming support for additional workflow languages)
  • User-owned infrastructure gives owners maximum control of your Terra Environment, including where all of your data in Terra is stored

Toward a unified Terra experience

Our vision is to iterate and improve these upgrades based on user feedback, starting with Terra on Azure Preview.

Once we validate that these changes meet current user needs and open opportunities for new user communities, we hope to implement many of these changes (as feasible) in Terra on Google. 

Current costs of using Terra on Azure

Working in the cloud in Terra has infrastructure and resource costs, outlined below. Terra passes Azure cloud resource charges to the user's subscription with no additional markup

Infrastructure cloud costs (per Terra Environment)

For maximum control over where your data is stored, customizable security, increased scalability (ability to store large amounts of data with no effect on performance), and the flexibility to integrate additional analysis apps, we’ve transitioned some infrastructure from Terra-owned to user-owned. 

How infrastructure costs scale

Infrastructure resources are deployed in tiers, with each tier having a greater cost. Each tier can accommodate a greater number of "smaller" workspaces (with fewer services running) or fewer "larger" workspaces (with many services running) before resources automatically scale up. 

A limited set of resources is first spun up with a daily “base cost” - currently $10-15/day*The starting base infrastructure can accommodate at least 1 workspace. When you create more workspaces, you may trigger a scale event, which currently adds an additional $5 day.

For more details, see Overview: Costs and billing (Azure)
* for Billing Projects deployed in the south-central region.

Infrastructure cost increases with

  • Number of workspaces
  • Number of services running in a workspace (data tables, workflow management, workflow engines)

One of our top priorities is to drive these costs down while increasing performance and usability.

These infrastructure cloud costs accrue as long as you have a Terra Billing project/workspaceWe have not yet released support for deleting a Terra Billing project once created. If you want to pause resources on your billing project to reduce costs after you start working in Terra on Azure Preview, please reach out to support@terra.bio for assistance. 

Note that this cost model differs from that of Terra on Google (see Overview: Terra costs and Billing - GCP). We expect the platform functionality and cost models to align as we develop multi-cloud Terra.

Variable (working) cloud costs (per Terra workspace)

Adding data to storage and running analyses will incur additional fees to cover the cloud resources used in the workspace. These costs are calculated following Azure’s pricing (see pricing in Overview: Costs and billing in Terra on Azure). Terra passes these costs along to users without any markup

All workspace writers and owners can charge to the workspace billing project!All operations performed in the workspace - regardless of who does the work - will be funded by the Azure subscription via the Terra Billing project. If you give a colleague writer permission on your workspace, they can run an analysis or add data to workspace cloud storage. 

For this reason, you will want to be careful when sharing a workspace, as you will be responsible for charges incurred by any colleague with permission to spend, regardless of whether they are included in the billing.

Workspace data tables

Data tables help store and organize data in an integrated, spreadsheet-like format. Primary data -  including clinical data, demographics, or phenotypic data - can all be stored in data tables. Data tables can also keep links to genomic data files in cloud storage (workspace or external).

Data tables are hosted in a private relational database set up when you create a workspace. The private database makes data tables more scalable and gives you complete control over where (what geographic location) your data lives in Azure.

Who can see data tables?

All workspace collaborators can interact with data tables in shared Terra on Azure workspaces (for workspaces created after May 17, 2023). This means you can share tables with your collaborators and actively modify tables together. 

Data tables are copied to workspace clones

As of July 31, 2023, workspace clones include the data tables from the original workspace. Once your Workspace Data Service is running, you will see the tables on the Data page. 

You will need a new Terra Billing project (created after Sept. 12, 2023)In order to take advantage of the new feature, you will need to set up a new Terra Billing project following the step-by-step instructions - step 2.3 in Setting up team billing in Terra on Azure (admins)

JupyterLab (interactive analysis)

Terra on Azure includes access to JupyterLab supported by Azure Data Science Virtual Machines (DSVM). This offering includes flexible VM and disk size configuration options, the option to use GPUs, detachable Persistent Disk storage, and a convenient file syncing service that automatically saves your notebook (.jpynb) files to and from your workspace blob storage.

Terra-on-Azure_Cloud-Environment-config-pane_Screenshot.png
Select from four pre-configured cloud compute profiles
and specify the Persistent Disk size in the Azure Cloud
Environment
setup pane. Cost estimates for the
configuration will be displayed in the
blue bar at the top. 

Avoiding runaway costs with autopause Because you pay for a JupyterLab instance as long as it is running - Terra includes an autopause feature to prevent runaway costs. Autopause is set to 30 minutes by default.

WDL workflows (batch analysis)

All colleagues can run workflows in a shared Terra workspace by going to the Workflows page and clicking on the Launch workflows app button. Clones of workspaces with workflows will also include the workflows, pre-configured to match the original workflow. 

For a tutorial and step-by-step instructions, see the COVID-19-Surveillance tutorial workspace and the accompanying step-by-step guide.

Bring your own workflow

Once you start the workflows engine (see How to set up and run a workflow), you can select from a number of curated workflows directly in Terra or import workflows with a GitHub link or directly from Dockstore.

See How to find and import a workflow for step-by-step instructions.

Workspace collaboration & sharing

Terra on Azure workspaces support multiple users with owner, reader, and writer roles. Owners control how much permission each collaborator has when sharing the workspace.

Shared by collaborators

  • Documentation
  • Data tables
  • Data stored in workspace storage
  • Notebooks (.jpynb files)
  • Workflows, workflow configurations, and submission details can all be viewed by collaborators
  • Workspace writers and owners can run workflows. 

What are single-user features?

Generated data in the Persistent Disk (JupyterLab). Each user has their own Cloud Environment VM, and any data from a JupyterLab analysis will be stored in their isolated VM PD.  

What is copied with clones?

  • Dashboard content
  • Data tables
  • Notebooks
  • Workflows (for origin workspaces created after December 1, 2023)

What is not copied/cloned?

  • Data files in the original workspace cloud storage

Getting started in Terra on Azure

Ready to get started using Terra on Azure? Follow the three steps below. 

1. Set up an account on Terra

Register for a Terra account (see How to set up an account in Terra on Azure).

2. Set up billing (admins)

Finance admins/users with access to an existing Azure subscription must set up cloud billing and link it to a Terra Billing Project following step-by-step instructions in Setting up billing in Terra on Azure (admins)

3. Explore a tutorial workspace

Featured workspaces let you try out the platform with pre-configured sample data, analysis tools, and documentation to guide you. You can find all Azure Featured Workspaces by going to the Terra Featured Workspace Library and filtering by "Azure".

Note that you will need to make your own clone, as Featured Workspaces are "read-only"!

 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.