Summary
We've identified an issue affecting some user workflows. User workflows are failing with the following error:
PipelinesApiRequestHandler actor termination caught by manager
Some user have also reported that their submissions are stuck in Queued
status.
See the Timeline section for the latest troubleshooting and resolution updates and the Impact section to understand how this could impact your use of the system.
Timeline
March 1, 2021 7:25 PM ET - Issue resolution - The long queue issue is confirmed to be self-correcting in Cromwell, and all submissions should no longer be queued by late March 1. The engineers will perform a retrospective on the situation and determine whether any improvements can be made to make sure this doesn't happen again.
March 1, 2021 4:00 PM ET - Issue investigation - Some users are still seeing long queue times for their submissions. If you are experiencing this issue, please do not abort your submission. Our engineers are currently investigating the underlying cause.
March 1, 2021 1:00 PM ET - Issue resolution - The rollback is complete, and we are no longer expecting to see users running into this error message. The queue time issue is currently self-resolving, and jobs should start kicking off in normal time windows shortly.
March 1, 2021 12:37 PM ET - Update - We are seeing an indication that errors have all but fully stopped cropping up at 12:37 PM ET. We are continuing to monitor a handful of residual lingerers and will update this page once the issue is confirmed to be fully resolved.
March 1, 2021 12:16 PM ET - Issue remediation - Google has identified the underlying issue and is currently working to roll back to a working state. We are awaiting confirmation that the rollback is complete.
March 1, 2021 11:40 AM ET - Issue investigation - Our engineers are continuing to investigate the root cause of the issue and are in communication with Google.
March 1, 2021 11:05 AM ET - Issue discovered - One of our engineers received this error message for their workflow and alerted our Batch team for investigation. The Batch team started investigating right away.
Impact
Some users may see their workflows fail with the error message above. Users may also see longer queue times for workflow submissions.
For more information
Please follow this article to get the most up-to-date information on this incident. If you would like to be notified of all service incidents or upcoming scheduled maintenance, click Follow on this page.
<