We’ve likely all heard the phrase “complexity is the enemy of security” many times. It’s an oft-used sound bite, but what can we learn from this concept to improve our respective security postures? Although there are many angles one could approach this concept from, I’d like to examine it from a security operations and incident response perspective.
Simplicity in Collection and Analysis
Most enterprises instrument their network to collect many different, highly specialized forms of data. For example, an organization may collect netflow data, firewall logs, DNS logs, and a variety of other specialized forms of data. This creates a stream of various different data types and formats that complicates and clouds the operational workflow. Unfortunately, the first question when performing analysis or incident response is often “Where do I go to get the data I need?” rather than “What questions do I need to ask of the data?”
In addition to the variety and complexity of these specialized forms of data, the volume of data they create often overwhelms enterprises. These huge quantities of data result in shorter retention periods and longer query times. This perfect storm of circumstances creates a very real operational challenge.
Fortunately, organizations can address this challenge by seeking out fewer, more generalized collection technologies that provides the required level of visibility with greatly reduced complexity and volume. Continuing with the above example, in lieu of many different highly specialized network data sources, an organization could consider one layer 7 enriched meta-data source.
Simplicity in Detection
Wikipedia defines an Indicator of Compromise (IOC) as “an artifact observed on a network or in an operating system that with high confidence indicates a computer intrusion.” Associated contextual information is also usually included along with the artifact and helps an organization to properly leverage the IOC. Context most often includes, among other things, information regarding to which attack stage an indicator is relevant. Attack stages can be broken up into three main families, each of which contains one or more attack stages:
• Pre-infection: reconnaissance, exploit, re-direct
• Infection: payload delivery
• Post-infection: command and control, update, drop, staging, exfiltration
It is well known that many organizations struggle with excessive amounts of false positives and low signal-to-noise ratios in their alert queues. There are several different angles from which an organization can approach this problem, and in fact, I have previously written about some of them. Another such approach, which can be used in combination with the others, is to go for the “money shot”.
At some point, when an organization wants to watch for and alert on a given attack, intrusion, or activity of concern, that organization will need to select one or more IOCs for this purpose. Going for the “money shot” involves selecting the highest fidelity, most reliable, least false-positive prone IOC or IOCs for a given attack, intrusion, or activity of concern. For example, if we look at a typical web-based re-direct attack, it may involve the following stages:
• Compromise of a legitimate third party site to re-direct to a malicious exploit site
• Exploitation of the system from the malicious exploit site
• Delivery of the malicious code
• Command and control, along with other post-infection activity
Although it is possible to use IOCs from all four of the above attack stages, using IOCs from the first three stages presents some challenges:
• Compromised legitimate third party sites likely number in the millions, meaning we would need millions of IOCs to identify just this one attack at this stage. Further, there is no guarantee that the attempted re-direct would succeed (e.g., if it were blocked by the proxy). An unsuccessful re-direct means that there was no attempt to exploit. In other words, for our purposes, a false positive.
• Exploits don’t always succeed, and as such, alerting on attempted exploits can often generate thousands upon thousands of false positives.
• If we see a malicious payload being delivered, that is certainly of concern. But what if the malicious payload does not successfully transfer, install, execute, and/or persist? We have little insight into whether a system is infected, unless of course, we see command and control or other post-infection activity.
Command and control (C2) and other post-infection activity, on the other hand, is always post-infection. That means that if we can distill a high fidelity, reliable IOC for this attack stage, we can identify malicious code infections immediately after they happen with a very low false positive rate. Obviously, preventing an attack is always preferable, but as we all know, this is not always possible. The next best option is timely and reliable detection.
Simplicity in O&M
When people began moving from the cities to the suburbs in the post-war United States in the 1950s, new infrastructure was built to serve the shifting population. The infrastructure served its population well for 50 years or so, until the 2000s, when the physical lifetime of water mains, electric power lines, and other infrastructure was reached. What people quickly realized is that although money and resources had been allocated to build and deploy infrastructure, money and resources had not been allocated to operate and maintain the infrastructure for the long term. In other words, O&M would be required to repair or replace the aging infrastructure, but the resources for that O&M would have to be found elsewhere.
Similarly, in the information security realm, as new business needs arise, new security technologies are often deployed to address them. Enterprises often forget to include O&M when calculating total cost. Another way to think of this is that each new security technology requires people to properly deploy, operate, and maintain it. If head count were increased each time a new security technology was deployed, the model would work quite well. However, as those of us in the security world know, head count seldom grows in parallel with new business needs. This presents a big challenge to the enterprise.
O&M cost (including the human resources required to properly deploy, maintain, and operate technology) is an important cost to keep in mind during the technology lifecycle. O&M cost is a large part of the overall cost of technology, but it is one that is often overlooked or underestimated. In an effort to lower total overall O&M costs, and building on the collection and analysis discussion above, it pays to take a moment to think about the purpose of each technology. Is this specific technology a highly specialized technology for a highly specialized purpose? Could I potentially retain the functionality and visibility provided by several specialized technologies through the use of a single, more generalized technology?
If the answer to these two questions is yes, it pays to think about consolidating security technologies through an exercise I like to call “shrinking the rack”. Shrinking the rack can be a great option, provided it doesn’t negatively affect security operations. Fewer specialized security technologies mean fewer resources to properly deploy, maintain, and operate them. That, in turn, means lower overall O&M costs. Lower O&M costs are always a powerful, motivating factor to consider.
The concept of simplicity is one that we can apply directly to security operations and incident response. This piece touches on just some of the variety of lessons we can learn from this topic. Although the phrase “complexity is the enemy of security” is a popular sound bite, if we dig a level deeper, we see that there is a great deal we can learn from the concept.