Service Incident - September 23, 2020

Summary

The workflow execution cluster slowed down over the course of a couple hours and eventually stopped processing workflows altogether for around half an hour Wednesday night (9/23). The cluster was restarted and most workflows have continued running as normal.

See the Timeline section for the latest troubleshooting and resolution updates and the Impact section to understand how this could impact your use of the system.

Timeline

September 23, 2020 9:22 PM ET - Issue remediation - Each of the impacted servers were restarted. Most workflows should continue running as normal.

September 23, 2020 8:28 PM ET - Issue discovered - Our engineering team discovered an issue regarding Cromwell Runners 101-103 going down.

Impact

Workflow and job throughput may have been a bit slower than normal but that has not been examined yet. Users might see events in their timing diagrams related to the restarts that occurred, similar to the normal timing diagram events that appear during a normal system upgrade.

If you are experiencing this issue, please reach out to support@terra.bio with your Terra email address and the billing project for the runtime, and we will be happy to fix your runtime for you.

For more information

Please follow this article to get the most up to date information on this incident. If you would like to be notified of all service incidents or upcoming scheduled maintenance, click Follow on this page.

Service Incident - September 23, 2020

Summary

Timeline

Impact

For more information

Was this article helpful?

That’s great, can you tell us why? (Click all that apply)

Thanks for your feedback, help us improve by telling us what you think could be better (click all that apply)

Comments