Data Science and Machine Learning Can Help Enterprises Keep Pace with Faster and More Automated Cyber Threats
In a recent column, I discussed the importance of separating the hype from reality around data science and cybersecurity.
This time around, I want to dive into where data science is headed and how you can begin incorporating it into your security practice. To do so, I’ll introduce a little theory and then reinforce that theory with some real-world examples from information security.
Learning from experience
Last week, AlphaGo, the brainchild of Google’s DeepMind, succeeded in besting one of the world’s top human players of the game Go. This is significant for a few reasons.
First, Go is a particularly complex game – an actual googol more complex that chess according to, ahem…Google. This complexity caused most AI experts to believe that beating a top human player was more than a decade away.
However, what is potentially more important is how the approach of AlphaGo differed from other previous approaches to AI. The team at DeepMind focused on developing generalized learning.
Instead of directly programming the solution to a complex problem, AlphaGo was designed to learn from data over time. Instead of being programmed with the answer, AlphaGo continuously learns and adapts from experience.
Going beyond the assembly line
This concept of iterative learning is crucial because machine learning is being used to make a similar change in information security. For many years, security products behaved like an assembly line when detecting threats.
A known process would be developed and then replicated at scale. As part of this process, products were programmed with the patterns of known threats, and the solution would dutifully scan the mountains of traffic in search of a match.
However, much like earlier attempts to teach computers to play games, this rigid, pre-programmed approach is easily defeated when facing an intelligent and adaptable adversary.
Data science and machine learning turns this model on its head. Instead of needing to have the answer, software can now learn from the data. Today’s machine learning models assess large groups of threats to find the subtle traits they have in common. These are hidden connections that are not obvious using human analysis.
For example, attackers constantly move their command-and-control servers to new domains and cycle through new IP addresses to stay ahead of reputation lists. But in the midst of this constant flux, machine learning models can successfully detect the underlying patterns of command-and-control behavior.
Different models have been able to highly accurately identify command-and-control traffic in the absence of any reputation or signature. This concept has likewise been successfully extended to finding specific malware communications such as malware receiving instructions or receiving an executable update. And much like AlphaGo, these models can be constantly trained with new data sets to find new trends or behaviors in attack traffic.
Just as important, these techniques are highly portable so security models can be trained in a local environment. These “unsupervised” machine learning models analyze and baseline local traffic to expose deviations. This type of analysis is instrumental at revealing devices that behave strangely and are compromised by malware.
Turning machines against machines
AI is often cast as the villain in pop culture, which is ironic because machines have proven to be some of the best resources for detecting and fighting other machines. This is a significant development because machine-based security is required to keep up with the rate and scale of machine-based threats.
As a case in point, malware often makes use of Domain Generation Algorithms (DGAs) to stay on the move and ahead of signatures. The constant avalanche of new URLs makes it impractical for human analysts to manually track and evaluate each new domain.
However, data science models are proving to be adept at recognizing these machine-generated domains. This has allowed organizations to protect themselves from malware while also lifting a huge burden from human analysts.
Conversely, the task of finding a human masquerading as a machine is equally essential. Remote Access Tools (RATs) are indispensable components of targeted attacks because they give remote human attackers real-time control over internal devices.
To circumvent firewall rules, RATs will connect from the inside out to the external human attacker. RATs want to appear as a normal end-user connecting to an external server. In reality, the roles are reversed with the bot on the inside and the human on the outside.
But by applying data science to the connection, threat detection solutions can see what’s really going on and expose the presence of a remote attacker. This can be done by watching the cadence of the conversation and distinguishing automated versus human behaviors.
These examples show how data science and machine learning are changing the world of information security. And while AlphaGo and other AI developments may win board games, cybersecurity’s high stakes involves playing for keeps.
Today, data science and machine learning are essential everyday tools that can help you keep pace with faster and more automated threats. As their individual techniques continue to evolve at a rapid clip, it’s important as security professionals to understand how we can use these technologies in everyday practice.