Security Experts:

Connect with us

Hi, what are you looking for?


Incident Response

Root Cause Analysis: Stop Playing Whack-a-Mole

Security Incident Root Cause Analysis

What Can we do Once we Identify the Root Cause? We Can Work to Address it. 

Security Incident Root Cause Analysis

What Can we do Once we Identify the Root Cause? We Can Work to Address it. 

Recently, a piece of Point-of-Sale (POS) malware, Backoff POS, has become big news.  I read several different write-ups on the malware, including the US-CERT alert (TA14-212A) that was released in late July.  In reviewing the different write-ups, I found a good deal of information regarding post-infection Indicators of Compromise (IOCs) to help organizations assess whether or not they have been compromised by Backoff POS.  The information I saw was great, and it is a good thing that organizations were able to receive such detailed IOC information.  But, I must admit that I was quite surprised by what I didn’t see in any of the write-ups I reviewed. Allow me to explain.

As a practitioner, customers often ask me how they can best mitigate or reduce the risk presented by a variety of threats. Point-of-Sale malware is one of those threats, for obvious reasons. The damage to an organization, monetary, public relations, or otherwise from a breach involving the theft of payment card data can be enormous. I get many questions when I meet with customers, but questions on mitigating or reducing risk are by far the most difficult. These questions require an intimate knowledge of specific threat vectors.  In other words, for a given risk or threat, I need to know how that threat can get into my organization in order to try and keep it out.

Given this, I was startled by how little information was available on the initial delivery mechanism or initial infection vector into the organization for Backoff POS.  All of the information I had access to was either about the malware itself, or its behavior on the network following infection. That’s all great information and should absolutely be fully leveraged, but it is all reactive information. Put another way, that information does not help the proactive defender to mitigate or reduce the risk presented by Backoff.

My intent here is not to pick on those who researched and analyzed Backoff or to belittle their work, which I thought was excellent. Rather, I want to raise awareness regarding something that we as a community do not do enough of – root cause analysis. In other words, the question of “how or why are we getting infected?” is an important one in my opinion, but one that is often paid too little attention. Instead, as a community, we seem to accept as our fate the need to play whack-a-mole. Allow me to elaborate.

On any given day, an organization will detect or receive notification regarding any number of infected systems on the network. The organization will then perform incident response accordingly, as we might expect. For those of us that have worked in the field of incident response for a while, we recognize this as a routine part of our day – just like drinking our morning coffee. As part of our incident response, we will improve and tighten our controls to prevent what happened today from happening tomorrow. Seems like a good approach, right?  Yes, absolutely – except for the fact that tomorrow, the attackers will be onto something else that we probably don’t have controls in place for.

If we take a step back, we see that from this perspective, incident response can begin to feel a bit like the arcade game whack-a-mole. Kill 12 infected systems today and their associated infection vectors, and tomorrow, 15 more will pop up. I’m not suggesting that we abandon this – incident response absolutely needs to be performed for systems we know are infected. Rather, I’m suggesting that we think about treating the cause of the infections, rather than the symptoms.  If we can treat the cause of the infections, we will have far fewer symptoms to treat.

Getting to the root cause involves a level of understanding beyond that of simply identifying that a system in infected. We need to understand what specifically enabled or facilitated the infection. It’s important to remember that root cause and infection vector are two different things. Identifying the infection vector allows us to know how the malicious payload was delivered. Identifying the root cause allows us to understand why the malicious payload succeeded in infecting the system. There is a subtle difference there.

Consider the all-too-common example of a drive-by re-direct attack delivering an exploit to a vulnerable version of Java. The infection vector tells us that an unsuspecting user (the innocent bystander) was re-directed to a malicious site that delivered an exploit. If we block the malicious site, there will be another one (or another 1,000) tomorrow. The root cause, on the other hand, tells us that the version of Java on the infected system was vulnerable, and it is upon this that the attackers preyed.

So how can we identify the root cause of infection? In order to identify root cause, we need to re-construct exactly what transpired during the infection to fully understand the sequence of events. In order to fully understand the sequence of events, we need to precisely extract only the relevant network traffic data and endpoint data. In order to precisely extract only the relevant data, we need to issue precise, targeted, and incisive queries across that data. In other words, we need to perform forensics to re-construct and fully understand what occurred.

What can we do once we identify the root cause? We can work to address it. For example, if vulnerable versions of Java are the root cause of 80% of our malicious code infections, we can work with IT to understand why we are running a vulnerable version of Java and correct that. Think of the ramifications here: By performing forensics to identify root cause and subsequently addressing the root cause, we could potentially achieve a five-fold decrease in malicious code infections. How do I know this?  I’ve seen it happen with my own eyes inside an enterprise.

As an added benefit, when there are less commodity malicious code infections to respond to, we can focus on other questions that are often overlooked because of lack of time.  For example, we might want to analyze our data looking for more sophisticated threats, or perhaps understand if we have particularly unusual traffic on our network that requires additional investigation. There is no shortage of good ways to invest newly liberated human resources.

Root cause analysis is a great thing, unless you like playing whack-a-mole that is.

Written By

Joshua Goldfarb (Twitter: @ananalytical) is currently a Fraud Solutions Architect - EMEA and APCJ at F5. Previously, Josh served as VP, CTO - Emerging Technologies at FireEye and as Chief Security Officer for nPulse Technologies until its acquisition by FireEye. Prior to joining nPulse, Josh worked as an independent consultant, applying his analytical methodology to help enterprises build and enhance their network traffic analysis, security operations, and incident response capabilities to improve their information security postures. He has consulted and advised numerous clients in both the public and private sectors at strategic and tactical levels. Earlier in his career, Josh served as the Chief of Analysis for the United States Computer Emergency Readiness Team (US-CERT) where he built from the ground up and subsequently ran the network, endpoint, and malware analysis/forensics capabilities for US-CERT.

Click to comment

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Join this webinar to learn best practices that organizations can use to improve both their resilience to new threats and their response times to incidents.


Join this live webinar as we explore the potential security threats that can arise when third parties are granted access to a sensitive data or systems.


Expert Insights

Related Content

Data Breaches

LastPass DevOp engineer's home computer hacked and implanted with keylogging malware as part of a sustained cyberattack that exfiltrated corporate data from the cloud...

Application Security

GitHub this week announced the revocation of three certificates used for the GitHub Desktop and Atom applications.

Data Breaches

GoTo said an unidentified threat actor stole encrypted backups and an encryption key for a portion of that data during a 2022 breach.


A recently disclosed vBulletin vulnerability, which had a zero-day status for roughly two days last week, was exploited in a hacker attack targeting the...

Incident Response

Cygnvs emerges from stealth mode with an incident response platform and $55 million in Series A funding.

Application Security

Password management firm LastPass says the hackers behind an August data breach stole a massive stash of customer data, including password vault data that...

Incident Response

Implementation of security automation can be overwhelming, and has remained a barrier to adoption

Data Breaches

T-Mobile disclosed another massive data breach affecting approximately 37 million customer accounts.