Connect with us

Hi, what are you looking for?


Data Protection

The Practical Side of Data Science to Security

Data Science and Machine Learning Can Help Enterprises Keep Pace with Faster and More Automated Cyber Threats

Data Science and Machine Learning Can Help Enterprises Keep Pace with Faster and More Automated Cyber Threats

In a recent column, I discussed the importance of separating the hype from reality around data science and cybersecurity.

This time around, I want to dive into where data science is headed and how you can begin incorporating it into your security practice. To do so, I’ll introduce a little theory and then reinforce that theory with some real-world examples from information security.

Learning from experience

Last week, AlphaGo, the brainchild of Google’s DeepMind, succeeded in besting one of the world’s top human players of the game Go. This is significant for a few reasons.

First, Go is a particularly complex game – an actual googol more complex that chess according to, ahem…Google. This complexity caused most AI experts to believe that beating a top human player was more than a decade away.

However, what is potentially more important is how the approach of AlphaGo differed from other previous approaches to AI. The team at DeepMind focused on developing generalized learning.

Advertisement. Scroll to continue reading.

Instead of directly programming the solution to a complex problem, AlphaGo was designed to learn from data over time. Instead of being programmed with the answer, AlphaGo continuously learns and adapts from experience.

Going beyond the assembly line

This concept of iterative learning is crucial because machine learning is being used to make a similar change in information security. For many years, security products behaved like an assembly line when detecting threats.

A known process would be developed and then replicated at scale. As part of this process, products were programmed with the patterns of known threats, and the solution would dutifully scan the mountains of traffic in search of a match.

However, much like earlier attempts to teach computers to play games, this rigid, pre-programmed approach is easily defeated when facing an intelligent and adaptable adversary.

Data science and machine learning turns this model on its head. Instead of needing to have the answer, software can now learn from the data. Today’s machine learning models assess large groups of threats to find the subtle traits they have in common. These are hidden connections that are not obvious using human analysis.

For example, attackers constantly move their command-and-control servers to new domains and cycle through new IP addresses to stay ahead of reputation lists. But in the midst of this constant flux, machine learning models can successfully detect the underlying patterns of command-and-control behavior.

Different models have been able to highly accurately identify command-and-control traffic in the absence of any reputation or signature. This concept has likewise been successfully extended to finding specific malware communications such as malware receiving instructions or receiving an executable update. And much like AlphaGo, these models can be constantly trained with new data sets to find new trends or behaviors in attack traffic.

Just as important, these techniques are highly portable so security models can be trained in a local environment. These “unsupervised” machine learning models analyze and baseline local traffic to expose deviations. This type of analysis is instrumental at revealing devices that behave strangely and are compromised by malware.

Turning machines against machines

AI is often cast as the villain in pop culture, which is ironic because machines  have proven to be some of the best resources for detecting and fighting other machines. This is a significant development because machine-based security is required to keep up with the rate and scale of machine-based threats.

As a case in point, malware often makes use of Domain Generation Algorithms (DGAs) to stay on the move and ahead of signatures. The constant avalanche of new URLs makes it impractical for human analysts to manually track and evaluate each new domain.

However, data science models are proving to be adept at recognizing these machine-generated domains. This has allowed organizations to protect themselves from malware while also lifting a huge burden from human analysts.

Conversely, the task of finding a human masquerading as a machine is equally essential. Remote Access Tools (RATs) are indispensable components of targeted attacks because they give remote human attackers real-time control over internal devices.

To circumvent firewall rules, RATs will connect from the inside out to the external human attacker. RATs want to appear as a normal end-user connecting to an external server. In reality, the roles are reversed with the bot on the inside and the human on the outside.

But by applying data science to the connection, threat detection solutions can see what’s really going on and expose the presence of a remote attacker. This can be done by watching the cadence of the conversation and distinguishing automated versus human behaviors.

These examples show how data science and machine learning are changing the world of information security. And while AlphaGo and other AI developments may win board games, cybersecurity’s high stakes involves playing for keeps.

Today, data science and machine learning are essential everyday tools that can help you keep pace with faster and more automated threats. As their individual techniques continue to evolve at a rapid clip, it’s important as security professionals to understand how we can use these technologies in everyday practice.

Written By

Click to comment

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Join security experts as they discuss ZTNA’s untapped potential to both reduce cyber risk and empower the business.


Join Microsoft and Finite State for a webinar that will introduce a new strategy for securing the software supply chain.


Expert Insights

Related Content

Application Security

Cycode, a startup that provides solutions for protecting software source code, emerged from stealth mode on Tuesday with $4.6 million in seed funding.

Data Protection

The cryptopocalypse is the point at which quantum computing becomes powerful enough to use Shor’s algorithm to crack PKI encryption.

Artificial Intelligence

The CRYSTALS-Kyber public-key encryption and key encapsulation mechanism recommended by NIST for post-quantum cryptography has been broken using AI combined with side channel attacks.

Risk Management

The supply chain threat is directly linked to attack surface management, but the supply chain must be known and understood before it can be...

Artificial Intelligence

ChatGPT is increasingly integrated into cybersecurity products and services as the industry is testing its capabilities and limitations.


The three primary drivers for cyber regulations are voter privacy, the economy, and national security – with the complication that the first is often...

Cybersecurity Funding

2022 Cybersecurity Year in Review: Top news headlines and trends that impacted the security ecosystem

Endpoint Security

Today, on January 10, 2023, Windows 7 Extended Security Updates (ESU) and Windows 8.1 have reached their end of support dates.