Connect with us

Hi, what are you looking for?

SecurityWeekSecurityWeek

Disaster Recovery

Google Issues Post Mortem on Gmail, YouTube Outage

Google has blamed a bug in its global authentication system for last week’s outage that affected Gmail, Calendar, YouTube, Meet and multiple other Google services.

Google has blamed a bug in its global authentication system for last week’s outage that affected Gmail, Calendar, YouTube, Meet and multiple other Google services.

The 47-minute outage last Monday, which severely affected operations at workplaces and schools globally, was caused by a bug in an automated quota management system that powers the Google User ID Service.

In a root cause incident report, Google explained that the Google User ID Service maintains a unique identifier for every account and handles authentication credentials for OAuth tokens and cookies.  This account data is stored in a distributed database, which uses Paxos protocols to coordinate updates. 

For security reasons, this service is programmed to reject requests when it detects outdated data.

Google said one of its automated tools used to manage the quota of various resources allocated for services contained a bug that caused error in authentication results, leading to the service outage.

“As part of an ongoing migration of the User ID Service to a new quota system, a change was made in October to register the User ID Service with the new quota system, but parts of the previous quota system were left in place which incorrectly reported the usage for the User ID Service as 0. An existing grace period on enforcing quota restrictions delayed the impact, which eventually expired, triggering automated quota systems to decrease the quota allowed for the User ID service and triggering this incident,” the company explained.

“Existing safety checks exist to prevent many unintended quota changes, but at the time they did not cover the scenario of zero reported load for a single service,” Google added.

The problem “was immediately clear as the new quotas took effect.”  At the height of the incident, Google could not verify that user requests were authenticated and the company confirmed it was seeing 5xx errors on virtually all authenticated traffic. 

Advertisement. Scroll to continue reading.

“The majority of authenticated services experienced similar control plane impact: elevated error rates across all Google Cloud Platform and Google Workspace APIs and Consoles, the company said.

Written By

Click to comment

Trending

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Discover strategies for vendor selection, integration to minimize redundancies, and maximizing ROI from your cybersecurity investments. Gain actionable insights to ensure your stack is ready for tomorrow’s challenges.

Register

Dive into critical topics such as incident response, threat intelligence, and attack surface management. Learn how to align cyber resilience plans with business objectives to reduce potential impacts and secure your organization in an ever-evolving threat landscape.

Register

People on the Move

Data security and privacy firm Protegrity has named Michael Howard as its CEO.

Anand Ramanathan has been appointed as Chief Product Officer at Deepwatch.

Managed security platform provider Deepwatch has appointed Sammie Walker as CMO.

More People On The Move

Expert Insights

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest cybersecurity news, threats, and expert insights. Unsubscribe at any time.