Machine learning has become the most popular new theme in security. Seemingly every vendor is adopting this capability in an attempt to either keep up or to make their product stand out in a crowded market. This creates confusion, because the term itself is often misunderstood, and the implications of its use are varied. Not only does “machine learning” mean different things to different people, different vendors also apply machine learning in different ways. All of this makes it difficult for buyers to separate the hype from reality, and to know what value all of this machine learning is actually bringing them.
To help clear up some of the confusion, let’s start by clarifying what machine learning in security is NOT:
1. It is not a form of protection. One of the biggest misconceptions is that machine learning is some kind of new product or feature that provides protection to keep companies safe. In fact, machine learning doesn’t actually provide protection, but rather informs how the protection operates by enabling faster, more accurate, broader and more in-depth analysis of threat data. In the face of the virtually overwhelming tsunami of security threat data being generated, it provides a level of efficiency in making sense of it all that human analysts could never achieve.
2. It is not a quick fix for outdated approaches. Most AV solutions are using machine learning to analyze file attributes to determine whether a file is malicious. But this is basically what AV has been doing for years — scanning new file attributes to compare them against known malware attributes to make a determination. And therein lies the problem: the reliance on static, known file attributes means that no amount of machine learning can stop unknown or fileless threats (with no file, there’s nothing to scan).
3. It is not necessarily always getting smarter. Like any analytics tool, machine learning-based security solutions are only as good as the data available—the proverbial “garbage in, garbage out.” Effective protection depends on high quantity, high quality and frequently updated data, and the right set of features, or attributes, to train on. The model must be regularly retrained using timely, relevant, high-fidelity data.
While machine learning has certainly improved endpoint security, clearly we’re still not quite there yet.
In order to be effective at stopping today’s most sophisticated malware, like fileless attacks, CPU-level exploits, and script and macro-based threats (as well as those threats yet to come), machine learning for endpoint security must be responsive to this current climate. It must be able to discern good software from the bad in real time, across all threat parameters and all system configurations.
So, what will it take for machine learning to deliver on the hype and power a truly transformative new wave of endpoint security?
1. It must analyze file behavior, not just attributes. Basing security decisions on file attributes only works when 1) there’s a file to analyze and 2) those attributes have been previously identified and embedded into the model. This leaves a huge void when it comes to detecting and stopping new, unknown variants, fileless and script/macro-based attacks. New, more responsive uses of machine learning in endpoint security can analyze runtime behavior—the system calls and commands of programs as they execute—enabling these solutions to also identify and block malicious activity as soon as it starts, providing broader, more dependable protection.
2. It must be informed by a timely, rigorously retrained model. Most vendors update their model every few months, but given the current cadence of new threat emergence, and the volatility of updates to beneficial software, this is not nearly enough. Truly responsive solutions leverage machine learning to update their model in near-real-time, as often as every 24 hours, to provide the most accurate, timely solution that can actually keep pace with today’s threat landscape.
3. It must account for goodware. Just as thousands of new malware variants threaten endpoints daily, legitimate software is also constantly changing with updates and unique integrations. Conventional security solutions that operate based on file attributes often struggle to consistently disambiguate good applications from malware or beneficial processes from malicious attacks. This results in a high rate of false positives and forces users to maintain whitelists and blacklists while waiting months for updated models. Responsive solutions overcome this problem by ingesting and building a model based not only on current malware data but also up-to-the-minute data on known-good software. This provides a more adaptive, agile model that ensures greater accuracy and coverage, while drastically reducing false positives and user friction.
There’s no doubt that machine learning has and will continue to revolutionize endpoint security. But it’s important to understand exactly how this technology actually works, including its limitations. Understanding this, companies can better protect themselves by asking the right questions to achieve the accurate, comprehensive, forward-looking coverage they need to be fully protected in the face of rapidly evolving and increasingly sophisticated threats.