Despite advances in cybersecurity technology, the number of days to detect a breach has increased from an average of 201 days in 2016 to an average of 206 days just a year later, according to the 2017 Ponemon Cost of Data Breach Study. While organizations are getting increasingly better at discovering data breaches on their own, 53 percent of breaches were discovered by an external source in 2017, meaning organizations had no idea their data had been compromised. Part of the problem is that there is no easy way for many organizations to automatically correlate and analyze all of the data being collected by the various security solutions that have been deployed across the network. That problem is compounded by the fact that many of these tools operate in isolation. The result is that IT teams have to hand correlate data collected from different sources looking for a needle in the haystack. The opportunity for human error is high and log files simply scroll by too quickly for anyone to be able to gather actionable information from them.
Filling the Gap with Machine Learning
Which is why many organizations—both end users and vendors—are turning to machine learning to fill the gap. Machine learning is a branch of artificial intelligence that uses algorithms to enable systems to become more accurate at predicting outcomes by analyzing data, identifying patterns, and then making decisions with minimal human intervention. This method leverages the speed, efficiency, and accuracy of technology to not only find patterns of behavior and indicators of compromise that humans would otherwise miss, but to respond to those threats in near real time.
Over time, often using some form of human assisted learning (sometimes referred to as a centaur model), these machine-learning systems become increasingly efficient and accurate. Which means that devices that can leverage machine learning are able to keep pace with advances being made by cybercriminals with nominal additional investment.
The big question is, how far out are security solutions enhanced with machine learning? Like many issues, the answer is complicated.
Fortunately, there are tools on the market right now that have actual machine learning capabilities built into them. But there are thousands of vendors in the security space, and since machine learning is now a buzz word, far too many claim to have machine learning capabilities when, in fact, they don’t. It’s a classic case of reading the fine print—like in the previous sentence. What’s the difference between machine learning and machine learning ‘capabilities’? Or a solution that claims to ‘use’ or ‘leverage’ machine learning? The distinctions can be important.
Another part of the challenge is that, at least in the minds of many consumers, there is some confusion about what machine learning even is. It’s sort of like the trend for every security vendor out there to claim that their product or service is cloud-enabled. It’s not really clear what that means unless you have a clear definition in mind.
Machine Learning in Use Today
A handful of the top security research companies have actually developed machine learning and other AI capabilities in order to sort through the massive amount of data out there in order to uncover new threats and identify new malware variants. Developing these systems, however, often requires massive investments. An Artificial Neural Network can contain billions of nodes, and engineers can spend months carefully hand training a system to identify anomalous behaviors and then categorize them as a threat.
Most organizations don’t have the resources to develop such a system on their own. That’s why they leverage solutions that can automate the correlation and detection process so they don’t find out about a breach until months later, and then through some third party.
Web Application Firewalls (WAFs) are among the list of security technologies that successfully embrace machine learning. Historically, WAFs rely on an observational method for threat detection called application learning (AL). Application learning automates the building of profiles for the structure and usage of web-based applications. Once enough information is collected, AL then builds policies based on what it has monitored. Subsequent user activities that don’t adhere to those policies trigger an anomaly where action is taken.
The fundamental problem is that AL is solely observational, meaning it flags anomalies based on what it has previously witnessed and does not have the intelligence to determine whether an anomaly is benign or malicious. This generally results in a lot of false positives, so managing these tools end up being very resource intensive.
Several forms of machine learning are changing all of that. At the simplest level, ML uses a statistical model to determine whether an HTTP request varies significantly from those previously observed. Only if the request has strayed too far from what is considered “normal” will it be flagged as an anomaly. However, rather than blocking that anomaly, it is sent for additional analysis to determine whether it is a threat or simply a benign variance (such as a typo). This analysis layer also employs machine learning by running the flagged anomaly through pre-trained and actively learning threat models to determine whether or not it is a threat. If it is, the WAF can take traditional actions, such as logging, alerting and blocking requests. But this time, with nearly 100 percent accuracy that actually improves over time.
A machine learning-based approach to threat detection is set to revolutionize the security industry. Actual machine learning technology has already been incorporated into a handful of traditional and cloud-based solutions, with more on the way. However, as usual during these times of market transition, it is essential that you understand exactly what is meant by machine learning so you can quickly differentiate between those solutions that actually provide the technology you need to stay ahead in the cyber war arms race, and those who are capitalizing on market hype.