Service Incident - January 19, 2024 (GCP Workflows)

Samantha (she/her)
  • Updated

Summary

We've received multiple reports of workflows failing with a 'Quota exceeded' error when trying to pull a GCR image. Our engineers are currently investigating this issue.

See the Timeline section for the latest troubleshooting and resolution updates and the Impact section to understand how this could impact your use of the system. 

Timeline

January 22, 2024 3:30 PM ET - Issue remediated - After applying the changes, there have been no instances of 'Quota exceeded' errors in the logs so our engineers have declared this incident as remediated.

January 22, 2024 2:20 PM ET - Fix deployed - Our engineers uploaded an alternate Google Cloud SDK image to our own project and updated Cromwell to the self-hosted image.

January 19, 2024 4:39 PM ET - Issue escalated - Our engineers decided to declare this issue an incident given the frequency of workflows failing with the same error. They are currently working with Google Support to investigate

January 19, 2024 10:50 AM ET - Known Issue posted  - After receiving a few reports from users regarding a 'quota exceeded' error in their failed workflows, we posted on our Known Issues board with details of the issue and provided a workaround. Our engineers also opened a support case with Google.

Impact

Workflows may fail with the message below:

Execution failed: generic::unknown: pulling image: docker pull: 
running ["docker" "pull" "gcr.io/google.com/cloudsdktool/cloud-sdk:354.0.0-alpine"]: exit status 1 (standard error:
"Error response from daemon: Head \"https://gcr.io/v2/google.com/cloudsdktool/cloud-sdk/manifests/354.0.0-alpine\":
toomanyrequests: Quota exceeded for quota metric
'Requests per project in the US multi-region'
and limit 'Requests per project in the US multi-region per minute'
of service 'artifactregistry.googleapis.com'
for consumer 'project_number:32555940559'.\n")"

The project mentioned 32555940559 is not one of ours and is the same for all users. 

 

As a workaround, users can retry their failed jobs, either manually or automatically by adding maxRetries to the task runtime.

 

For more information

Please follow this article to get the most up to date information on this incident. If you would like to be notified of all service incidents or upcoming scheduled maintenance, click Follow on this page

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.