The workflow execution cluster slowed down over the course of a couple hours and eventually stopped processing workflows altogether for around half an hour Wednesday night (9/23). The cluster was restarted and most workflows have continued running as normal.
See the Timeline section for the latest troubleshooting and resolution updates and the Impact section to understand how this could impact your use of the system.
September 23, 2020 9:22 PM ET - Issue remediation - Each of the impacted servers were restarted. Most workflows should continue running as normal.
September 23, 2020 8:28 PM ET - Issue discovered - Our engineering team discovered an issue regarding Cromwell Runners 101-103 going down.
Workflow and job throughput may have been a bit slower than normal but that has not been examined yet. Users might see events in their timing diagrams related to the restarts that occurred, similar to the normal timing diagram events that appear during a normal system upgrade.
If you are experiencing this issue, please reach out to email@example.com with your Terra email address and the billing project for the runtime, and we will be happy to fix your runtime for you.
For more information
Please follow this article to get the most up to date information on this incident. If you would like to be notified of all service incidents or upcoming scheduled maintenance, click Follow on this page.
Please sign in to leave a comment.