Service Incident - May 9, 2019

Sushma Chaluvadi
  • Updated

Summary

Users may be experiencing slowness of workflows due to outages of one or more internal services. The issue was found at approximately 8:00 PM Wednesday, May 8, 2019, by internal monitoring and impacts users running workflows on Terra/FireCloud. See the Timeline section for the latest troubleshooting and resolution updates and the Impact section to understand how this could impact your use of the system. 

Timeline

5:03 PM | May 9, 2019

Final Confirmation: A fix will be released next week to resolve logger crashes due to size of log files.

4:58 PM | May 9, 2019

Issue Resolved: The team has isolated the issue. Users should refrain from using CNVNator as it has been seen to generate large log files resulting in logger crash. All other workflows should continue to work normally.

12:32 PM EDT | May 9, 2019

Fix Deployment: The team is continuing to develop and deploy a fix.

11:34 AM EDT | May 9, 2019

Fix Deployment: The team is currently working on confirmation of the issue and a fix. 

11:30 AM EDT | May 9, 2019

Issue Discovered: The team has discovered that a large stderr log of large size is causing the logger to crash resulting in slowness in workflow submissions.

9:00 AM EDT | May 9, 2019

Issue Investigation: Team has determined that large workflows may have caused instability of system causing intermittent outages of multiple services.

8:00 PM EDT | May 8, 2019

Issue Discovered: Internal monitoring services alerting teams of outages in execution engine, Cromwell.

Impact

Users may experience slowness when running workflows in Terra/FireCloud.

For more information

Please follow this article to get the most up to date information on this incident. If you would like to be notified of all service incidents or upcoming scheduled maintenance, click Follow on this page

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.