Connect with us

Hi, what are you looking for?

SecurityWeekSecurityWeek

Disaster Recovery

Google Issues Post Mortem on Gmail, YouTube Outage

Google has blamed a bug in its global authentication system for last week’s outage that affected Gmail, Calendar, YouTube, Meet and multiple other Google services.

Google has blamed a bug in its global authentication system for last week’s outage that affected Gmail, Calendar, YouTube, Meet and multiple other Google services.

The 47-minute outage last Monday, which severely affected operations at workplaces and schools globally, was caused by a bug in an automated quota management system that powers the Google User ID Service.

In a root cause incident report, Google explained that the Google User ID Service maintains a unique identifier for every account and handles authentication credentials for OAuth tokens and cookies.  This account data is stored in a distributed database, which uses Paxos protocols to coordinate updates. 

For security reasons, this service is programmed to reject requests when it detects outdated data.

Google said one of its automated tools used to manage the quota of various resources allocated for services contained a bug that caused error in authentication results, leading to the service outage.

“As part of an ongoing migration of the User ID Service to a new quota system, a change was made in October to register the User ID Service with the new quota system, but parts of the previous quota system were left in place which incorrectly reported the usage for the User ID Service as 0. An existing grace period on enforcing quota restrictions delayed the impact, which eventually expired, triggering automated quota systems to decrease the quota allowed for the User ID service and triggering this incident,” the company explained.

“Existing safety checks exist to prevent many unintended quota changes, but at the time they did not cover the scenario of zero reported load for a single service,” Google added.

The problem “was immediately clear as the new quotas took effect.”  At the height of the incident, Google could not verify that user requests were authenticated and the company confirmed it was seeing 5xx errors on virtually all authenticated traffic. 

Advertisement. Scroll to continue reading.

“The majority of authenticated services experienced similar control plane impact: elevated error rates across all Google Cloud Platform and Google Workspace APIs and Consoles, the company said.

Written By

Click to comment

Trending

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Join the session as we discuss the challenges and best practices for cybersecurity leaders managing cloud identities.

Register

SecurityWeek’s Ransomware Resilience and Recovery Summit helps businesses to plan, prepare, and recover from a ransomware incident.

Register

People on the Move

Mike Dube has joined cloud security company Aqua Security as CRO.

Cody Barrow has been appointed as CEO of threat intelligence company EclecticIQ.

Shay Mowlem has been named CMO of runtime and application security company Contrast Security.

More People On The Move

Expert Insights

Related Content

Vulnerabilities

Less than a week after announcing that it would suspended service indefinitely due to a conflict with an (at the time) unnamed security researcher...

Cybercrime

A recently disclosed vBulletin vulnerability, which had a zero-day status for roughly two days last week, was exploited in a hacker attack targeting the...

Cybercrime

The changing nature of what we still generally call ransomware will continue through 2023, driven by three primary conditions.

Data Breaches

OpenAI has confirmed a ChatGPT data breach on the same day a security firm reported seeing the use of a component affected by an...

IoT Security

A group of seven security researchers have discovered numerous vulnerabilities in vehicles from 16 car makers, including bugs that allowed them to control car...

Vulnerabilities

A researcher at IOActive discovered that home security systems from SimpliSafe are plagued by a vulnerability that allows tech savvy burglars to remotely disable...

Risk Management

The supply chain threat is directly linked to attack surface management, but the supply chain must be known and understood before it can be...