Cloud Security

Storms in the Cloud: Lessons from the Amazon Cloud Outage

The Cloud — While Great — Doesn’t Absolve Companies from taking Fundamental Precautions to Safeguard their Systems

June 6, 2011

The Cloud — While Great — Doesn’t Absolve Companies from taking Fundamental Precautions to Safeguard their Systems

It started in the early hours of April 21st with a tiny router configuration error during a routine upgrade to an Elastic Block Storage (EBS) instance — EBS is Amazon’s storage service, employed by applications that use Amazon’s EC2 cloud service. Engineers accidentally switched traffic destined for a high-capacity production network onto a secondary, backup network. This second network was designed for redundancy and data replication, not to manage the large volume of production data suddenly flooding into it. This led to many nodes becoming stuck in a replication loop as they searched for the storage space they had been cut off from. Amazon described the cascading failure as a “re-mirroring storm.”

Backup Cloud Environments Major red-hot startups were hit by Amazon’s cloud storm: Dropbox, Foursquare, Quora, Reddit, HootSuite. All of them went down and stayed down, some for days. Amazon has pioneered the cheap, scalable and convenient hosting of Web-based services, thereby unleashing a flood of new companies whose entire business model is based around the Amazon cloud. The storm, however, showed the world that the cloud — while great — does not absolve companies from taking fundamental precautions to safeguard their systems online.

Cloud computing is in its ascendency. Apple is set to announce its iCloud initiative this week, joining in a fierce competition for users with Microsoft and Google as well as conventional Web hosting providers such as Rackspace. Analysts estimate that enterprises are preparing to spend tens of billions of dollars annually on software-as-a-service and infrastructure-as-a-service over the next few years. The cost efficiencies and scalability benefits that can be realized when established companies and upstarts alike outsource their application infrastructure, storage and processing cycles to cloud-based infrastructure services are undeniable.

Protecting Cloud Deployments, Backup With all the hype that the cloud has been subject to over the last few years, those who believed that the cloud was the silver bullet were due a wake-up call. While the cloud allows for a rapid and easy start-up experience, storms in the cloud effectively shut down businesses completely unless you take sensible measures to protect yourself.

The result of all this was that some EC2 customers, largely those whose applications and data were stored in the Amazon Availability Zone that was affected, saw a day or more of downtime. Some lost data, and many more lost face with their customers. Some companies, however, managed to struggle through the incident with only minor availability problems. What did they do that other high-profile ventures did not?

The Cloud is Not a Disaster Recovery Plan

Moving to the cloud carries with it the promise of massive scalability, availability and redundancy, but it’s no substitute for an effective disaster recovery plan. Cloud services are architecturally complex and relatively new design concepts. It is inevitable that they will sometimes break. While Amazon’s post-incident report was fairly comprehensive and rightfully earned the company praise, some customers reported frustration with the lack of communication while their services were offline. During the downtime, they had no idea how long it was likely to last and felt the options for recovery were limited. Of course, it is likely that Amazon’s own engineers did not know how long the outage was likely to last, but that is little consolation for a company whose primary business is online.

In this regard, the move to the cloud actually increases the need for a well-tested disaster recovery plan, one that takes into account the lack of visibility you’re likely to have during an incident. The companies that managed to ride out the storm of the Amazon outage were those that had designed their infrastructure with failure in mind, knew where the potential weak spots were, and knew what they needed to do if any particular component of their services failed.

Backup. Backup. Backup.

Advertisement. Scroll to continue reading.

The Amazon outage hit its storage services primarily, and some customers did lose data permanently as a result. But a sound, frequent off-site backup policy would have proved invaluable in maintaining uptime during the incident. A full 24 hours offline is an almost inconceivably long time for any company that does its primary business on the Web, yet some cloud users suffered that and longer. In several cases, Amazon customers had no option but to sit out most of the downtime in frustrated silence, waiting for Amazon’s engineers to resolve the situation, because they had no recent backup data they could fail over to, and no plan for doing so. Amazon offers customers protection against this type of problem by offering Availability Zones except, in this case, the outage affected multiple Availability Zones that were in one geographic region. Backing up regularly to a separate Availability Zones in diverse geographic locations could have saved the day for some companies.

Does the Cloud Match Your Policies?

Financial accounting, consumer data privacy, medical confidentiality, and payment processing data security — whatever your regulatory compliance requirements, you may find that migrating your applications and data to the cloud will impact your risk profile. You should carefully review your privacy policies against the cloud provider’s infrastructure, and ask some specific questions:

• Where, physically and geographically, will my data be stored, and what is the legal jurisdiction of that location?

• Who has access to my data?

• Is my data stored and processed on shared infrastructure?

• When the cloud goes down, is my data at risk of loss or leakage?

• Is my data stored in an encrypted manner? If so, how are the private keys managed?

The utility paradigm offered by the cloud should not create a mindset of dangerous indifference to the underlying infrastructure in use. A disaster recovery plan is critical whether you process data in-house, or in the cloud, and backup is a vital component of any such plan.

Your disaster recovery, backup and policy review plans need to be audited and tested. The next major cloud outage will almost certainly not be like the one experienced before. While enjoying the many benefits of cloud computing, it would do well to go back to some old-school IT policies and procedures that you probably threw out as “useless.”

With great advances in cloud computing come great responsibilities of redundancy and diversity. Those who fail to heed this essential truth are dancing with a disaster of their own making.

Written By Ram Mohan

Latest News

Click to comment

CIEM Chat: How to Reduce Cloud Identity Risk

March 26, 2024

Join the session as we discuss the challenges and best practices for cybersecurity leaders managing cloud identities.

Virtual Event: Ransomware Resilience & Recovery Summit

April 17, 2024

SecurityWeek’s Ransomware Resilience and Recovery Summit helps businesses to plan, prepare, and recover from a ransomware incident.

Navigating Vendor Speak: A Security Practitioner’s Guide to Seeing Through the Jargon

As a security industry, we need to focus our energies on those professionals among us who know how to walk the walk. (Joshua Goldfarb)

SD-WAN: Don’t Build a Dead End, Prepare for Future-Proof Secure Networking

SD-WAN must be scalable, stable, secure, and fully operational to serve as a strong base for seamless modernization and progression to SASE. (Etay Maor)

You Against the World: The Offenders Dilemma

Foreign attackers have many more toolsets at their disposal, so we need to make sure we’re selective about our modeling, preparation and how we assess and fortify ourselves. (Tom Eston)

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

With automated, detailed, contextualized threat intelligence, organizations can better anticipate malicious activity and utilize intelligence to speed detection around proven attacks. (Marc Solomon)

Know Your Audience When Speaking to Security Practitioners

How can security practitioners make sense of the vendor landscape and separate those who talk a good game from those who can execute, perform, and solve real problems for enterprises? (Joshua Goldfarb)

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Cycode, a startup that provides solutions for protecting software source code, emerged from stealth mode on Tuesday with $4.6 million in seed funding.

Eduard KovacsSeptember 24, 2019

Cybercrime

Comodo Forums Hacked via Recently Disclosed vBulletin Vulnerability

A recently disclosed vBulletin vulnerability, which had a zero-day status for roughly two days last week, was exploited in a hacker attack targeting the...

Eduard KovacsOctober 1, 2019

Incident Response

Amazon’s Shuttering of Alexa Ranking Service Hits Cybersecurity Industry

Amazon has shut down Alexa.com.

Eduard KovacsMay 6, 2022

CISO Conversations

CISO Conversations: CISOs in Cloud-based Services Discuss the Process of Leadership

SecurityWeek talks to Billy Spears, CISO at Teradata (a multi-cloud analytics provider), and Lea Kissner, CISO at cloud security firm Lacework.

Kevin TownsendAugust 15, 2023

Cloud Security

Microsoft Cloud Hack Exposed More Than Exchange, Outlook Emails

Cloud security researcher warns that stolen Microsoft signing key was more powerful and not limited to Outlook.com and Exchange Online.

Ryan NaraineJuly 21, 2023

Hackers Stole Encrypted Backups, MFA Settings from GoTo, LastPass

Data Breaches

LastPass Says DevOps Engineer Home Computer Hacked

LastPass DevOp engineer's home computer hacked and implanted with keylogging malware as part of a sustained cyberattack that exfiltrated corporate data from the cloud...

Ryan NaraineFebruary 27, 2023

CISO Strategy

Okta Hack Blamed on Employee Using Personal Google Account on Company Laptop

Okta is blaming the recent hack of its support system on an employee who logged into a personal Google account on a company-managed laptop.

Ryan NaraineNovember 3, 2023

Incident Response

Microsoft Puts ChatGPT to Work on Automating Cybersecurity

Microsoft has rolled out a preview version of Security Copilot, a ChatGPT-powered tool to help organizations automate cybersecurity tasks.

Ryan NaraineMarch 28, 2023

SECURITYWEEK NETWORK:

ICS:

SecurityWeek

Cloud Security

Storms in the Cloud: Lessons from the Amazon Cloud Outage

More from Ram Mohan

Latest News

Trending

CIEM Chat: How to Reduce Cloud Identity Risk

Virtual Event: Ransomware Resilience & Recovery Summit

People on the Move

Expert Insights

Navigating Vendor Speak: A Security Practitioner’s Guide to Seeing Through the Jargon

SD-WAN: Don’t Build a Dead End, Prepare for Future-Proof Secure Networking

You Against the World: The Offenders Dilemma

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

Know Your Audience When Speaking to Security Practitioners

Related Content

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Cybercrime

Comodo Forums Hacked via Recently Disclosed vBulletin Vulnerability

Incident Response

Amazon’s Shuttering of Alexa Ranking Service Hits Cybersecurity Industry

CISO Conversations

CISO Conversations: CISOs in Cloud-based Services Discuss the Process of Leadership

Cloud Security

Microsoft Cloud Hack Exposed More Than Exchange, Outlook Emails

Data Breaches

LastPass Says DevOps Engineer Home Computer Hacked

CISO Strategy

Okta Hack Blamed on Employee Using Personal Google Account on Company Laptop

Incident Response

Microsoft Puts ChatGPT to Work on Automating Cybersecurity

SECURITYWEEK NETWORK:

ICS:

More from Ram Mohan

Latest News

Trending

Daily Briefing Newsletter

CIEM Chat: How to Reduce Cloud Identity Risk

Virtual Event: Ransomware Resilience & Recovery Summit

People on the Move

Expert Insights

Navigating Vendor Speak: A Security Practitioner’s Guide to Seeing Through the Jargon

SD-WAN: Don’t Build a Dead End, Prepare for Future-Proof Secure Networking

You Against the World: The Offenders Dilemma

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

Know Your Audience When Speaking to Security Practitioners

Related Content

Application Security

Source Code Security Firm Cycode Launches With $4.6 Million in Funding

Cybercrime

Comodo Forums Hacked via Recently Disclosed vBulletin Vulnerability

Incident Response

Amazon’s Shuttering of Alexa Ranking Service Hits Cybersecurity Industry

CISO Conversations

CISO Conversations: CISOs in Cloud-based Services Discuss the Process of Leadership

Cloud Security

Microsoft Cloud Hack Exposed More Than Exchange, Outlook Emails

Data Breaches

LastPass Says DevOps Engineer Home Computer Hacked

CISO Strategy

Okta Hack Blamed on Employee Using Personal Google Account on Company Laptop

Incident Response

Microsoft Puts ChatGPT to Work on Automating Cybersecurity