Connect with us

Hi, what are you looking for?


Cloud Security

Microsoft Details Cause of Recent Multi-Factor Authentication Outage

Microsoft has provided information on the root cause of the massive outage that last week impacted its Azure Active Directory authentication services across Europe, Asia, and the Americas.

Microsoft has provided information on the root cause of the massive outage that last week impacted its Azure Active Directory authentication services across Europe, Asia, and the Americas.

The 14-hour outage impacted Microsoft Azure AD Multi-Factor Authentication (MFA) services on November 19 and prevented users of Office 365, Azure, Dynamics and other services from logging in if MFA was required. The event was mitigated on November 19, but details on what caused it were only provided now.

During their investigation of the issue, Microsoft’s engineers discovered three root causes, each of them leading to the other. Due to gaps in telemetry and monitoring for the MFA services, however, the identification of these causes was delayed, and the mitigation time was extended.

The first two root causes, Microsoft explains, were introduced in a roll-out of a code update that began in some datacenters on November 13. They would be activated once a certain traffic threshold was exceeded, which happened on November 19 in the Azure West Europe (EU) datacenters due to morning peak traffic.

The first root cause was a latency issue in the Azure AD MFA frontend’s communication to its cache services. Triggered under high load once a certain traffic threshold was reached, the issue would render the services susceptible to the second root cause.

The second issue was a race condition in processing responses from the MFA backend server. It eventually led to recycles of the MFA frontend server processes, thus triggering additional latency, as well as the third root cause on the MFA backend.

Previously undetected, the third identified root cause led to accumulation of processes on the MFA backend. This eventually resulted in resource exhaustion on the backend, thus preventing it from processing further requests from the MFA frontend.

Advertisement. Scroll to continue reading.

The outage was essentially caused by a change recently rolled out to more effectively manage connections to the caching services, which “introduced more latency and a race-condition in the new connection management code, under heavy load,” Microsoft explains.

Because of this rollout, the MFA service slowed down processing of requests, which first impacted the West EU datacenters. Microsoft’s engineers attempted to mitigate the issue in various manners, including changing traffic patterns in the EU datacenters and disabling auto-mitigation systems to reduce traffic volumes, which eventually caused the same issues in the East US datacenters too.

After discovering that backend resource limits were exhausted and that MFA messages were no longer delivered to customers, the recent deployment was rolled back and capacity was added, which mitigated the latency issue, but the service was fully restored only after the MFA backend servers were cycled.

“The initial diagnosis of these issues was difficult because the various events impacting the service were overlapping and did not manifest as separate issues. This was made more acute by the gaps in telemetry that would identify the backend server issue,” Microsoft says.

The outage was mitigated on November 19, but the incident was kept open for two more days to monitor and investigate any further problems. Microsoft also decided to roll out additional improvements to the Azure platform by December 2018 to prevent similar issues from happening.

The company will review its update deployment procedures, review the monitoring services to reduce detection time, review containment processes to avoid propagating an issue, and update the communications process to the Service Health Dashboard and monitoring tools to detect publishing issues immediately during incidents.

Related: Microsoft Adds New Tools to Azure DDoS Protection

Related: Microsoft Boosts Azure Security With Array of New Tools

Written By

Ionut Arghire is an international correspondent for SecurityWeek.

Click to comment

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

SecurityWeek’s Threat Detection and Incident Response Summit brings together security practitioners from around the world to share war stories on breaches, APT attacks and threat intelligence.


Securityweek’s CISO Forum will address issues and challenges that are top of mind for today’s security leaders and what the future looks like as chief defenders of the enterprise.


Expert Insights

Related Content

Application Security

Cycode, a startup that provides solutions for protecting software source code, emerged from stealth mode on Tuesday with $4.6 million in seed funding.

Identity & Access

Zero trust is not a replacement for identity and access management (IAM), but is the extension of IAM principles from people to everyone and...


A recently disclosed vBulletin vulnerability, which had a zero-day status for roughly two days last week, was exploited in a hacker attack targeting the...

Data Breaches

LastPass DevOp engineer's home computer hacked and implanted with keylogging malware as part of a sustained cyberattack that exfiltrated corporate data from the cloud...

Identity & Access

Hackers rarely hack in anymore. They log in using stolen, weak, default, or otherwise compromised credentials. That’s why it’s so critical to break the...

Application Security

GitHub this week announced the revocation of three certificates used for the GitHub Desktop and Atom applications.

Data Breaches

GoTo said an unidentified threat actor stole encrypted backups and an encryption key for a portion of that data during a 2022 breach.

Incident Response

Microsoft has rolled out a preview version of Security Copilot, a ChatGPT-powered tool to help organizations automate cybersecurity tasks.