CONFERENCE Cyber AI & Automation Summit - Watch Sessions
Connect with us

Hi, what are you looking for?

SecurityWeekSecurityWeek

Incident Response

Beyond the Hype of Data Science

With RSA Conference on the horizon, odds are that if you make it to the exhibit floor, you will hear a lot about data science and machine learning.

With RSA Conference on the horizon, odds are that if you make it to the exhibit floor, you will hear a lot about data science and machine learning.

Security vendors old and new are touting the powers of data science to solve security problems. And while these technologies have real value, the terms are rapidly becoming empty marketing buzzwords.

To keep our collective heads above water, it is important to understand the realities behind these technologies so we can separate the truth from the hype and make well-informed security decisions.

A quick intro to data science and why it matters

The world of data science can be hard to navigate, not simply because it involves lots of hard math, but also because it spans an enormously broad set of disciplines. Data science is concerned with the many ways that knowledge can be extracted from data including mathematics, statistics, machine learning, and a variety of analytics just to name a few.

Cybersecurity: Using data science and machine learning technologyA subset of data science, machine learning enables software to iteratively learn from data and adapt without being programmed. For example, machine learning can reveal low-level traits that command-and-control messages have in common, or signal an impending data theft when unusual employee behavior occurs. These characteristics might be unknown beforehand, but machine-learning models can recognize these signs from the data.

These examples illustrate critical concepts that make data science and machine learning important to security professionals.

First, the intelligence we extract from very large data sets tends to be fairly long-lived. Instead of chasing every URL a command-and-control server uses, we can learn its core underlying behavior and recognize it wherever it goes. This allows our security detections to stay well ahead of attackers.

Second, machine learning extends intelligence to the local environment. An intelligence feed will never be able to tell you when one of your employees starts behaving abnormally. It’s the sort of thing that must be learned locally, and is often the essential context needed to find a live threat.

All data is not equal

Advertisement. Scroll to continue reading.

Data science models inherently depend on the quality of data they consume. The better the data, the more you will be able to learn. An entire industry has been spawned by analyzing logs and events generated by other systems. While this approach may help connect the dots between observed events, it rarely finds hidden threats that go undetected in the first place.

By nature, logs are a secondary source of data that briefly summarize an event. Information that is not contained in the log is lost and unavailable for further analysis. Equally important, logs are only as good as the systems that generated them. If an upstream firewall or security device fails to detect a threat, there will be no log.

This is a fundamental issue. It is the job of a cyber security solution to detect threats that slip by standard layers of defense. Data science and machine learning can be applied to any data source, not just log data. Direct analysis of traffic, files or devices allows us to detect what was previously invisible.

Focus on answers, not data

Having looked at the inputs to a data science detection model, we can now turn our attention to what they actually deliver. And this is where things can get a little dicey if you’re not careful.

While the promises may sound enticing, the vast majority of security and analytics solutions require a significant amount of human effort and attention in order to deliver value. Needless to say, most security organizations don’t have the luxury of extra time or staff.

As a precaution, be sure to evaluate whether a prospective solution makes life easier or harder on your staff. Many products generate mountains of anomalies that require a human analyst to investigate, and this bottleneck will severely limit your real-world value.

It is critically important for security products to actually deliver high-confidence detections and answers. Of course, analysts will always need solid evidence to validate that a threat is real. But this should be an effort of verification and not require the analyst to do the heavy lifting of intensive analysis and diagnosis.

These are of course not the only factors to consider when evaluating data science and machine learning solutions. However, it can provide some context to cut through the hype and find the data science solutions that are most likely to deliver real value.

Written By

Click to comment

Trending

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Don’t miss this Live Attack demonstration to learn how hackers operate and gain the knowledge to strengthen your defenses.

Register

Join us as we share best practices for uncovering risks and determining next steps when vetting external resources, implementing solutions, and procuring post-installation support.

Register

People on the Move

Shanta Kohli has been named CMO at Sysdig.

Cloud security firm Sysdig has appointed Sergej Epp as CISO.

F5 has appointed John Maddison as Chief Product Marketing and Technology Alliances Officer.

More People On The Move

Expert Insights

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest cybersecurity news, threats, and expert insights. Unsubscribe at any time.