This article summarizes the cloud components you'll use in Terra and how working in the cloud differs from working locally.
This is a living document. Check back here to see the current state of Terra on Azure.
Overview: Terra on Azure
Terra is a cloud-native platform for storing, accessing and analyzing biomedical data. Terra was designed to help accelerate research with frictionless research flows from data to results: integrating data, analysis tools, and built-in security components.” Terra on Azure uses Microsoft Azure’s cloud infrastructure for data analyses and storage, extending Terra's capabilities from its original Google Cloud Platform (GCP) implementation.
Unified project data and tools; streamlined access; built-in security
Your isolated Terra Environment is a managed application deployed entirely within your private Azure subscription. Terra functions like powerful desktop computer that uses cloud resources for data storage and analysis, accessible via your browser. Terra manages the details and nuance of setting up all Azure cloud resources. Users can store, access and analyze data right in Terra workspaces with integrated functionality. Security features baked into the platform ensure you can access controlled data efficiently and share securely with approved collaborators.
Setting up your Terra Environment You configure the region for the cloud resources (VMs and storage containers) when you set up billing. Enterprise users can add additional security features such as additional logging for controlled- or PHI data (see Enterprise Terra plans).
Key features
- Browser-Based and Cloud-Native: Access and manage data stored in various cloud locations from a single, shareable workspace.
- Collaborative: Built-in security and access controls facilitate seamless teamwork.
- Enhanced performance: Since your Terra Environment is deployed in your private Azure subscription, your cloud infrastructure is yours and yours alone. Your Terra Environment can easily scale to run hundreds of thousands of workflows at once.
- Data isolation and control of data and tools: User-owned infrastructure ensures maximum control over all aspects of your Terra Environment, including data storage in your Azure subscription.
Current costs of using Terra on Azure
Infrastructure costs
Your private Terra Environment is an isolated set of resources within your Azure subscription shared with individuals within your organization. You pay a fixed cost for the cloud resources that power your Terra Environment infrastructure. See How to limit your base cost.
A limited set of resources is first spun up with a daily “base cost” - currently $10-15/day*The starting base infrastructure can accommodate at least 1 workspace. When you create more workspaces, you may trigger a scale event, which currently adds an additional $5 day.
For more details, see Overview: Costs and billing (Azure).
* for Billing Projects deployed in the south-central region.
-
Terra Environment setup
Creating a Terra Billing project launches a private Terra Environment in your Azure subscription, shared within your organization. Infrastructure costs start to accrue at this point. -
Cost scaling
Infrastructure resources are deployed in tiers, with each tier having a greater cost. Each tier can accommodate a greater number of "smaller" workspaces (with fewer services running) or fewer "larger" workspaces (with many services running) before resources automatically scale up. Infrastructure costs start at a base of $10-15/day, with additional charges as resources scale to accommodate more workspaces. -
Controlling infrastructure costs
Infrastructure costs increase with the number of workspaces and the number of services running in a workspace (data tables, workflow management, workflow engines)
These infrastructure cloud costs accrue as long as you have a Terra Billing project/workspaceDeleting the Terra Billing project is the only way to eliminate the base infrastructure cost, but this will also delete everything you have done in Terra.
To delete a Terra Billing Project
You must delete all workspaces listed under a Billing Project before you can delete the Billing Project and its associated Terra Environment.
From the Billing page, click the trash can icon to the right of the Billing project. Note that you must be an owner to delete the Billing Project, so it will be listed under the Owned by You section at the left.
Note that this cost model differs from that of Terra on Google (see Overview: Terra costs and Billing - GCP). We expect the platform functionality and cost models to align as we develop multi-cloud Terra.
Variable (usage) cloud costs (per Terra workspace)
You pay for cloud resources consumed in a workspace.
-
Data storage
Includes unstructured data files in associated workspace blob storage and tabular data in workspace data tables (backed by a dedicated relational database) -
Data analysis
Terra uses virtual machines - or VMs - compute and disk for batch workflows and interactive analysis (JupyterLab) -
Networking/egress
Costs for copying between cloud providers or between regions within a single cloud provider.
These costs are calculated following Azure’s pricing (see pricing in Overview: Costs and billing in Terra on Azure). Terra passes these costs along to users without any markup.
All workspace WRITERS and OWNERS can charge to the workspace billing project!If you give a colleague writer permission on your workspace, they can run an analysis or add data to workspace cloud storage. All operations performed in the workspace - regardless of who does the work - will be funded by the Azure subscription via the Terra Billing project.
For this reason, you will want to be careful when sharing a workspace, as you will be responsible for charges incurred by any colleague with permission to spend, regardless of whether they are included in the billing.
Data and analysis functionality
Terra is built to store, access, and analyze biomedical data, including genomic and phenotypic data. You work in a Terra workspace, which integrates several distinct Azure cloud resources for storing, accessing and analyzing data in the Azure cloud in a single computational sandbox.
Storing and organizing data in a Terra workspace
Where is data “in the cloud” that you analyze in Terra actually stored? All data in Terra exists in the cloud, either in external cloud storage or in a Terra workspace. When working in Terra, where the data is stored depends on what kind of data it is and how you intend to analyze it.
Unstructured files (workspace cloud storage)
Unstructured data can be stored in the dedicated workspace cloud storage (blob container) or the Cloud Environment Persistent Disk (JupyterLab).
Tabular data (your private relational database)
Data tables help store and organize data in an integrated, relational database.
-
Primary data
Including clinical data, demographics, or phenotypic data -
Metadata (organize files in blob storage)
Data tables can also keep links to genomic data files in cloud storage (workspace or external).
Data tables are hosted in a private relational database set up when you create a workspace. The private database makes data tables more scalable and gives you complete control over where (what geographic location) your data lives in Azure.
Data tables are copied to workspace clones
As of July 31, 2023, workspace clones include the data tables from the original workspace. Once your Workspace Data Service is running, you will see the tables on the Data page.
Who can see data tables?
All workspace collaborators can interact with data tables in shared Terra on Azure workspaces. This means you can share tables with your collaborators and actively modify tables together in the workspace.
JupyterLab (interactive analysis)
Terra on Azure includes integrated JupyterLab supported by Azure Data Science Virtual Machines (DSVM). File syncing automatically saves your notebook (.jpynb) files to your workspace blob storage.
Available customizations
- VM type and disk size
- Optional GPUs (see How to use GPUs in a notebook)
- Detachable Persistent Disk storage
See Interactive analysis for more details.
Select from four pre-configured cloud compute profiles
and specify the Persistent Disk size in the Azure Cloud
Environment setup pane. Cost estimates for the
configuration will be displayed in the blue bar at the top.
Avoiding runaway costs with autopause Because you pay for a JupyterLab instance as long as it is running - Terra includes an autopause feature to prevent runaway costs. Autopause is set to 30 minutes by default.
WDL workflows (batch analysis)
All colleagues can run workflows in a shared Terra workspace by going to the Workflows page and clicking on the Launch workflows app button. Clones of workspaces with workflows will also include the workflows, pre-configured to match the original workflow.
For a tutorial and step-by-step instructions, see the COVID-19-Surveillance tutorial workspace and the accompanying step-by-step guide.
Bring your own workflow
Once you start the workflows engine (see How to set up and run a workflow), you can select from a number of curated workflows directly in Terra or import workflows with a GitHub link or directly from Dockstore.
See How to find and import a workflow for step-by-step instructions.
Collaborating & sharing
Terra on Azure workspaces support multiple users with owner, reader, and writer roles. Owners control how much permission each collaborator has when sharing the workspace.
Shared by collaborators
- Documentation
- Data tables
- Data stored in workspace cloud storage
- Notebooks (.jpynb files)
- Workflows, workflow configurations, and submission details can all be viewed by collaborators
- Workspace writers and owners can run workflows.
Generated data (JupyterLab analysis) in the Persistent Disk is not sharedEach user has their own Cloud Environment, and any data from a JupyterLab analysis will be stored in their isolated Cloud Environment PD. See Cloud Environment: Persistent disk storage for more details.
What is copied with clones?
- Dashboard content
- Data tables
- Notebooks
- Workflows (for origin workspaces created after December 1, 2023)
- Workflow configurations (for workflows run in the origin workspace)
- Submission history
What is not copied/cloned?
- Data files in the origin workspace cloud storage
Three steps to get started in Terra on Azure
Ready to get started using Terra on Azure? Follow the three steps below.
1. Set up an account on Terra
Register for a Terra account with either a Google or Microsoft SSO. See How to set up an account in Terra on Azure.
2. Set up billing (admins)
Finance admins/users with access to an existing Azure subscription must set up cloud billing and link it to a Terra Billing Project following step-by-step instructions in Setting up billing in Terra on Azure (admins). Once you set up a Terra Billing Account, Terra will launch your isolated Terra Environment in your Azure tenant.
3. Explore a tutorial workspace
Featured workspaces let you try out the platform with pre-configured sample data, analysis tools, and documentation to guide you. You can find all Azure Featured Workspaces by going to the Terra Featured Workspace Library and filtering by "Azure".
- Bulk and single cell RNA Seq Analysis with Bioconductor workspace (JupyterLab-based)
- COVID-19-Surveillance tutorial workspace (workflows-based) and step-by-step guide.
Note that you will need to make your own clone, as Featured Workspaces are "read-only"!
Changelog
Track changes in Terra on Azure functionality here.
Date |
Change |
2023-12-20 | Added multi-user workflow |
2023-10-01 | Remove reference to the "allow list" as Terra on Azure entered general availability. |
2023-08-02 | Tables come along with workspace clones |
2023-05-16 | Collaborators can see tables in a shared workspace |
2023-04-14 | Added persistent Disk functionality (JupyterLab) |
2023-04-11 | Autopause functionality added |
2023-03-22 | Added ability to run a workflow, delete workspace |