Does your analysis seem "stuck" - running or progressing unusually slowly? Have you gotten an error message that includes the word "quota" or when trying to run a large analysis (workflow or interactive)? It could be because you've exceeded the resource quota for a particular kind of resource. Read on to understand GCP resource quotas and how they can affect your work on Terra.
Overview: Resource quotas and why they matter in Terra
In the Google Cloud Platform (GCP), resource quotas limit how many resources like central processing units (CPUs and GPUs) and persistent disks (PDs), can be used by a single Google project at any given time.
Why does Google have resource quotas?
Quotas prevent unforeseen spikes in usage, making sure resources are available to the community at all times. To learn more, see Google's resource quotas documentation here.
How resource quotas impact your analyses
Resource quotas affect your ability to spin up a large VM to run a workflow or interactive analysis. They can also impact the speed of your analysis (your workflow analysis will run slowly or not at all), since tasks will pause or slow as you run up against a compute or disk quota.
What is affected by resource quotas
If you are close to or exceed your resource quota, Terra will not be able to secure the CPUs, GPUs, or PD requested. All workflows (or "methods") and Cloud Environments that run in Terra are affected by GCP compute and disk quotas.
- CPUs: how many CPUs you can use at once across all tasks
- GPUs: how many GPUs you can use at once across all tasks
- Preemptible CPUs: the pool of CPUs that would only be used by preemptible instances. You can learn more about this quota here and about preemptible instances here.
- Persistent disk standard(GB): how much total disk (non-SSD) you can have attached at once to your task VMs
- Persistent disk SSD(GB): how much total SSD disk you can have attached at once to your task VMs
- Local SSD(GB): how much SSD is attached directly to the server running the task VMs. You can learn more in Google's documentation. This quota only applies if you are using local SSD in your task.
What decides your quota?
Google enforces default resource quotas (i.e. throttles how much of a resource can be used by a single Google project) based on the GCP billing reputation of the Cloud Billing account owner.
What does your quota cover?In Terra these limits will apply per workspace (for those created after September 27, 2021) or per Terra Billing project (for workspace created before September 27th).
If you (your Google ID) are new, you will have the default quota. As you use (and pay for) GCP resources, your quota will increase. You can also request an increase (see instructions in How to troubleshoot and fix stalled workflows).
Symptoms of bumping up against a resource quota
Quota limits are not always easy to diagnose! Below are some behaviors and error messages you may experience after launching a workflow analysis if there is not enough resource (i.e. VM compute or disk capacity) in your quota:
- Tasks within your workflow progress slowly (i.e. go from queued to running) while they wait on quota availability.
If you requested 1,000 tasks with eight CPUs each, and your quotas allow 24 CPUs at once, you can only run three tasks at a time. Each subsequent task is queued.
- A task in your workflow fails when it requests more resources than your quota allows.
For example, if you requested 60 CPUs in your task and your quota is capped at 24 CPUs at once, your workflow may fail to launch.
Next steps: Check your quota and ask for an increase
Confirm if a resource quota is keeping your analysis from running efficiently and request more following the instructions in How to troubleshoot and fx stalled workflows.
When to request more resource quota If you are seeing errors
If you see quota errors or messages in your logs - when your workflow fails because a task requested more resources than you have in your quota - you will need to update your resource quota.
If you need to see results faster
In many cases, if you exceed your resource quota, your analysis will simply run more slowly. This may be fine, or it may not. If your workflow is stalled and you need to progress, you may want to request an increase.
Curious what is happening behind the scenes? See How the workflow system works.