Summary
On the evening of June 5, a member of our notebooks team identified a bug related to dataproc multi-node spark clusters. When users who have created a multi-node spark cluster stop their runtime in the Terra UI, it's possible that only some but not all of the cluster workers actually change to Stopped status.
Multi-node spark clusters are created when the Runtime type is set to Configure as spark cluster. This is not the type most users use.
Our notebooks team manually stopped workers they noticed were in this erroneous Running state and have been actively monitoring to stop any others that crop up.
See the Timeline section for the latest troubleshooting and resolution updates and the Impact section to understand how this could impact your use of the system.
Timeline
June 9, 2020 4:36 PM ET - Issue resolution - A fix for the bug has been released.
June 8, 2020 3:52 PM ET - Issue remediation - The notebooks engineering team is continuing to work on refining the fix, and manually stop any cases of erroneous state cluster workers.
June 5, 2020 8:23 PM ET - Issue remediation - The notebooks engineer who discovered the issue has written fixes for resolution of the bug.
June 5, 2020 2:35 PM ET - Issue discovered - One of our notebooks engineers created a multi-node spark cluster with 4 workers. After initiating a stop on the UI, 2 of the 4 workers were not stopped on the Google end. The engineer manually stopped any user cluster workers that were in this erroneous state.
Impact
Cases where users' cluster workers did not all change to Stopped status when the runtime was paused may have resulted in unexpected additional charges, as these workers maintained a Running state. The notebooks engineering team is continuing to monitor and manually stop any cases of erroneous state cluster workers while the fix is being worked on.
For more information
Please follow this article to get the most up to date information on this incident. If you would like to be notified of all service incidents or upcoming scheduled maintenance, click Follow on this page.
There is also a Known Issues post for this issue. Please follow the Known Issues board for email updates of new posts.