There’s No One Size Fits All Disaster Survival Plan: Every Incident Requires A Situationally Appropriate Response...
Last month, I unglued myself from the 24/7 coverage of “super storm” Sandy long enough to make a run to the grocery store to buy bread, milk, and eggs. I don’t know why would I need French toast when the power goes out, as the streets were run dark with water and sewage, and trees and power lines were crashing down randomly like Tinker Toys of the gods, but that’s what the grave meteorologists told me to do.
I filled up my car with gas at their behest, and considered buying a generator but couldn’t work out how to hook it into my power vent, without which my furnace doesn’t work. I could fire up the generator and plug in my iPhone, though—emergency phone service and a handheld TV all in one! Ah well, next time.
It all reminded me of the anxiety in the wake of the 2001 terrorist attacks. By accounts, many of the big chain hardware stores sold out of duct tape after the government issued advice on being prepared for another attack. I didn’t hear of any stores running out of plastic sheeting though, which we were also urged to stock up on, and no one I know had any idea what they were supposed to do with those items and under what circumstances. The only context rendered by the administration is to have them on hand in the event of a chemical or biological attack.
The same mentality is often transferred into the context of business. I’ve seen firsthand, and have worked on many business continuity plans (BCPs), and for the most part they sound reasonable. If the plan calls for stockpiling duct tape, there’s an associated event that triggers the protocol that calls for duct taping something.
But for every sane plan, I’m guessing there are a dozen impractical ones, judging by the number of interrupted services to customers in the news in the last few years:
● RIM’s Blackberry network was crippled for three days in October of 2011 when a core switch failed and the backup switch didn’t take over as previously tested;
● A hospital in London suffered a power cut that lasted several days. When the backup generator kicked-in, it couldn’t handle the sudden load demand. The secondary and tertiary backup generators did the same, ultimately leaving the hospital without power.
● A bank in Canada lost power to a data center, resulting in all branches unable to provide services, including over the counter transactions, cash machines, and online banking, for nearly a full working day;
● A router component failure at the FAA, compounded by a software configuration problem, brought down a flight management system for nearly five hours, forcing air traffic controllers to rely on faxes and emails to communicate flight plans.
The failures are due to many factors. Pick from the following; feel free to choose as many as you like—it’s a la carte!
● Flawed forethought about the causes or severity of business disruption;
● Out of date plan--the business operations changed without updating the BCP;
● Recovery priorities not well defined;
● Lack of training--what do I do with this duct tape now that this Sarin gas is seeping into the building?
● Critical communications not in place (related to priorities and training);
● Under purchasing solutions--underpowered generators, weak point in technology redundancy;
● Over purchasing solutions--being talked into massively redundant system that are so complicated they don’t failover when needed;
Note that I didn’t list lack of testing. This is a sticky problem. Most organizations that have the time to craft a solid BCP do make an effort to test them. However, live testing is a political hot potato, particularly in information security. The Y2K panic was a perfect illustration of how little we really know about our systems and the negative effect they might have if they spontaneously combusted.
Any critical technology touches a great many facets of an organization, and when all the stakeholders come to the table and are presented with, “we’re going to shut off the core router to test our failover capability,” it’s rare that no one will put up an objection. Rather, everyone becomes a meteorologist, forecasting the worst doomsday scenario once the router goes nighty-night.
And I’m counting the IT folks here as well. We’ve all spent the night in a data center with in a nervous sweat trying to restore services to some business critical information system or another, with an imaginary counter dangling over our head in the shape of a sword, tallying the real money it’s costing every minute we don’t get IT operations back up and running smoothly. So even though the necks of IT teams are on the line when the plan falls apart in a real-world disaster scenario, they’re equally loath to create that very same situation without a disaster to blame.
There are organizations that perform regular and comprehensive business continuity exercises--although only 4% to 24%, depending on the study--and kudos to them because it’s expensive and doesn’t generate revenue. I know that there are executives hoping a disaster will happen in order to justify the investment. Notice I used the word “comprehensive”; point-testing does not setup an adequate failure scenario to provide the confidence that all systems will be resilient. Modern businesses are insanely complicated; with business processes and technology so intertwined that no one can truly understand the failure permutations. It really is butterfly wings.
But not every organization needs complete continuity; that’s what risk management is all about. You know what I did to prepare for Sandy? I took a shower.
See, I have no kids, I have a fireplace and firewood, I always have plenty of canned food, not to mention what’s in the freezer that would have to be eaten in short order, and I’m pretty sure our electric provider is on the ball after the spanking they got after last year’s Halloween snowstorm that left so many people without power for over a week in some cases. But I’m not going to use a source of drinking water stored in my hot water tank for bathing once in the full throes of an emergency (not to mention I’d rather not take an ice cold shower). Ipso facto, I wanted to be squeaky clean before my date with Sandy; it might have been my last chance for a while.
Yet everyone else was stripping the shelves at my local Stop & Shop grocery store. And of largely perishable items, I might add. What in Sandy’s name do you think is going to happen when your power goes out? I really have to get me one of those magic refrigerators.
My point is that not every organization needs to keep running in all disaster scenarios. Some just have plans to shut down for a few days. Maybe all you need is to make sure your data is backed up and available to be used elsewhere in case the event is more severe than even the Chicken Littles are predicting, as is the case in New Jersey and New York City, where the sky really did fall. As I write this, over a week after hurricane Sandy, many still don’t have power, heat, and a place to take a shower.
There’s no one size fits all disaster survival plan: a server compromise is vastly different than full scale nuclear attack, and both require situationally appropriate responses, as do the spectrum of scenarios in between. That means:
● Project real scenarios based upon past occurrences as well as what-ifs. Be cognizant of commonplace incidents, like short-term power outages, as well as remote ones, like a meteor strike. Contrary to what Hollywood may have us believe, apocalyptic events are not released on us twice a year.
● Determine the impact on operations: how long will it keep you out of business? What’s the monetary loss, recovery cost, reputation hit?
● Decide where the cost/return threshold is for purchasing, implementing, managing, and updating your business continuity resources; there’s a cutoff point for every organization, where it’s no longer feasible to even consider business continuity.
● Don’t forget to factor in the real needs of your customer base: at what point are your customers also impacted and won’t care if your goods and services are available? If you supply pet apparel, you’re not top of mind for customers--except for that one guy who desperately needs a raincoat for his Corgi.
● Infrastructure providers--utilities, banking, transportation--have a responsibility to the public to be available during disasters, even more so in the case of fire & safety, hospitals, and the government. They have to prepare for the Waterworld, Deep Impact, The Day After Tomorrow, and Zombie Apocalypse contingencies.
What it comes down to is that disaster preparedness is something we should do as a normal course of conducting business and our personal lives; it should not be a reaction to an imminent crisis.
Related Reading: Network Security Lessons from Sandy
Related Reading: Business Continuity Planning in a Cloud Enabled World