Connect with us

Hi, what are you looking for?

SecurityWeekSecurityWeek

Cloud Security

Microsoft Details Cause of Recent Multi-Factor Authentication Outage

Microsoft has provided information on the root cause of the massive outage that last week impacted its Azure Active Directory authentication services across Europe, Asia, and the Americas.

Microsoft has provided information on the root cause of the massive outage that last week impacted its Azure Active Directory authentication services across Europe, Asia, and the Americas.

The 14-hour outage impacted Microsoft Azure AD Multi-Factor Authentication (MFA) services on November 19 and prevented users of Office 365, Azure, Dynamics and other services from logging in if MFA was required. The event was mitigated on November 19, but details on what caused it were only provided now.

During their investigation of the issue, Microsoft’s engineers discovered three root causes, each of them leading to the other. Due to gaps in telemetry and monitoring for the MFA services, however, the identification of these causes was delayed, and the mitigation time was extended.

The first two root causes, Microsoft explains, were introduced in a roll-out of a code update that began in some datacenters on November 13. They would be activated once a certain traffic threshold was exceeded, which happened on November 19 in the Azure West Europe (EU) datacenters due to morning peak traffic.

The first root cause was a latency issue in the Azure AD MFA frontend’s communication to its cache services. Triggered under high load once a certain traffic threshold was reached, the issue would render the services susceptible to the second root cause.

The second issue was a race condition in processing responses from the MFA backend server. It eventually led to recycles of the MFA frontend server processes, thus triggering additional latency, as well as the third root cause on the MFA backend.

Previously undetected, the third identified root cause led to accumulation of processes on the MFA backend. This eventually resulted in resource exhaustion on the backend, thus preventing it from processing further requests from the MFA frontend.

The outage was essentially caused by a change recently rolled out to more effectively manage connections to the caching services, which “introduced more latency and a race-condition in the new connection management code, under heavy load,” Microsoft explains.

Advertisement. Scroll to continue reading.

Because of this rollout, the MFA service slowed down processing of requests, which first impacted the West EU datacenters. Microsoft’s engineers attempted to mitigate the issue in various manners, including changing traffic patterns in the EU datacenters and disabling auto-mitigation systems to reduce traffic volumes, which eventually caused the same issues in the East US datacenters too.

After discovering that backend resource limits were exhausted and that MFA messages were no longer delivered to customers, the recent deployment was rolled back and capacity was added, which mitigated the latency issue, but the service was fully restored only after the MFA backend servers were cycled.

“The initial diagnosis of these issues was difficult because the various events impacting the service were overlapping and did not manifest as separate issues. This was made more acute by the gaps in telemetry that would identify the backend server issue,” Microsoft says.

The outage was mitigated on November 19, but the incident was kept open for two more days to monitor and investigate any further problems. Microsoft also decided to roll out additional improvements to the Azure platform by December 2018 to prevent similar issues from happening.

The company will review its update deployment procedures, review the monitoring services to reduce detection time, review containment processes to avoid propagating an issue, and update the communications process to the Service Health Dashboard and monitoring tools to detect publishing issues immediately during incidents.

Related: Microsoft Adds New Tools to Azure DDoS Protection

Related: Microsoft Boosts Azure Security With Array of New Tools

Written By

Ionut Arghire is an international correspondent for SecurityWeek.

Click to comment

Trending

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Learn how the LOtL threat landscape has evolved, why traditional endpoint hardening methods fall short, and how adaptive, user-aware approaches can reduce risk.

Watch Now

Join the summit to explore critical threats to public cloud infrastructure, APIs, and identity systems through discussions, case studies, and insights into emerging technologies like AI and LLMs.

Register

People on the Move

Jason Hogg has been named Executive Chairman of CYPFER.

HUB Cyber Security has appointed former PayPal and American Express executive Paul Parisi as its Global Chief Revenue Officer.

Cloud security startup Upwind has appointed Rinki Sethi as Chief Security Officer.

More People On The Move

Expert Insights

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest cybersecurity news, threats, and expert insights. Unsubscribe at any time.