Security Experts:

Hunting the Snark with Machine Learning, Artificial Intelligence, and Cognitive Computing

Detecting Cyber Threats with Machine Learning, Artificial Intelligence, and Cognitive Computing

How do you find the unknown unknown that may have breached your perimeter defenses and might be doing something you cannot see? Answer: you move from deterministic solutions to a probabilistic approach and let your computer do the work.

This is the purpose of new developments in machine learning (ML), artificial intelligence (AI), and cognitive computing (CC) within cyber security. Each approach is made possible by the ability of cloud computing to provide new levels of computing power and big data collection and analysis; and each has huge potential for the future of cyber security.

Basic Definitions

AI and ML have accepted definitions, although it should be noted that many users and even some vendors use the terms interchangeably. CC is the newest development and is still subject to some variation in definition.

Machine Learning. ML can be described as ‘rules-on-steroids’. The volume of data and the complexity of business have made it impossible to manually generate and check all the rules necessary to detect a subtle intrusion. ML is the process of getting the system itself to generate and modify new and complex rules.

There are two sub-categories to ML: supervised learning and unsupervised learning. Supervised learning involves analyst intervention in helping the machine to generate the correct rules. Unsupervised learning comes from automatic recognition of anomalous behavior and the automatic adjustment of rules. The machine doesn’t learn in the traditional sense, it merely responds in accordance to the underlying algorithms.

Hunting Snark Cover

Artificial Intelligence. AI is an attempt to mimic human decision-making processes; that is, to get the system to make its own decisions. At the heart of AI are the algorithms that work on the data in order to generate a decision. AI is hard; and we are only at the very beginning of its development.

Cognitive Computing. CC is the most recent development. It is often defined as “the simulation of human thought processes in a computerized model. Using self-learning algorithms that use data mining, pattern recognition and natural language processing, the computer can mimic the way the human brain works.” The presentation of data output in a visualized format is often considered an important part of CC.

However, whichever term is used, they all share a common purpose: to develop accurate methods for the rapid detection of any subtle but malicious presence built on probability scores. It is likely that any one solution will incorporate elements of the other approaches. This is against the backdrop of an increasing skills gap in quality security analysts. It is a moot point whether there is an actual intent to eliminate the need for expert analysts, or simply to reduce the dependence upon them.

Probability Machines in Use

Whichever description is applied to the methodology, the ultimate purpose is the same: to use probability scores to differentiate between benign and malicious activity. But they all suffer from one major difficulty: it is very hard to define or label a sufficient quantity of ‘anomalous facts’ on which to base decision-making. In the jargon, these are labels; and cyber security is a ‘thin label space’. Out of the billions of actions that happen on networks every day, only a very few are malicious.

It is in the manner of finding and classifying or labeling malicious events that most vendors who seek to automate security analytics differentiate their products.

PatternEx

PatternEx, a company newly out of stealth mode, combines rare event modeling and active contextual modeling. It starts by running all of the system data it can get through its own algorithms looking for ‘outliers’ – events that are either rare or unusual. Those outliers are manually labeled as either benign or malicious – and so the machine learns.

But, “Where I think everyone will ultimately have to go,” suggests Travis Reed, CMO at PatternEx, “is to move into a more real-time learning of the system. Some people call it active learning – we call it active contextual modeling. That's where you can take the data that's coming in, look at the data and with human involvement be able to train the entire system to be able to recognize that attack. Once that attack has been learned, it can go through all of your data and go find other instances or related events of similar attacks – thus protecting all of your points of ingress and egress in real-time.”

PatternEx claims, verified by a peer reviewed research paper view presented at the IEEE International Conference on Big Data Security in April, that this approach is effective. In the research, the system was tested with 3.6 billion pieces of log data, and was able to detect 85% of attacks, while reducing false positives by a factor of five.

Vectra Networks

Vectra Networks is a company using both supervised and unsupervised machine learning in its products. Gunter Ollmann, Vectra’s CSO, explains that the unsupervised learning element comes from first baselining the network’s usual behavior. The system then monitors for any abnormal behavior on the network. It can be aided, he added, with ‘hints’, or manually labeled events – but basically it works on its own. It is very good at spotting low and slow exfiltration and even subtle behavioral anomalies; but it is poor at labeling and classifying those anomalies. It still requires the human analyst to evaluate what it finds.

Vectra’s supervised learning element is about building and training mathematical models specific to classes of threats. Ollmann calls these ‘n-dimensional signature models’ – matrices of signatures with probabilities. He used Vectra’s own new ransomware detection solution to describe how this is done. “I take as many ransomware samples as I can get and run each one in the cloud. From that I capture all of the network activity of every piece of malware, giving me an archive of say 2000 packet captures associated with the behavior of all known ransomware. Then I take a similar volume of packet captures that are not ransomware but could be confused with ransomware behavior – such as data backup and archiving.”

At this point he has a positive class (ransomware) and a negative class of similar but not malicious behaviors – and he also has a pool of hundreds of millions of packet capture samples from actual network activity. He then turns to a specialist machine that uses machine learning models from all the standard ML techniques. These models are used against the data and the results analyzed.

Since the good and bad behaviors are already known, the analysts can tell which ML model does the best job of recognizing ransomware traits. Ultimately, there is a single model with the best possible ransomware recognition characteristics. “This model is pushed out to beta-testing customers, and I monitor how the model works with real data,” explained Ollmann. “The beta test results are passed back to the cloud and combined with the original data. The process iterates until a new and effective model has matured and it is eventually pushed out to all customers."

The difference between this approach and earlier approaches is that new data and new ransomware behaviors are continually fed into the model (the model is recalculated tens of thousands of times each day) and continuously pushed out to customers. The machine keeps learning.

However, it is still important to note that while the machine has learned to do all of the number-crunching, the intelligence comes from the analysts who have to label the data good or bad.

Ollmann believes that the big difference between machine-learning threat detection and older IPS and SIEM technologies lies in its accuracy. It doesn’t issue an alert on the basis of a single signature, but collects related activities until it reaches a sufficient probability score to issue an accurate alert. In this way the model actually mimics the work of an expert analyst who manually collates different pieces of evidence before deciding whether the alert should be issued.

IBM Watson

In May 2016 IBM announced a new project: Watson for Security. The idea is to bring IBM’s cognitive computing technology, named Watson, to help analysts develop and maintain maximum network security.

Watson differs from the majority of artificial intelligence engines. Traditionally computers have been built and developed to analyze structured data. This lends itself to the analysis of the structured data that comes from multiple different system logs. However, some 80% of our security knowledge is found in unstructured documents: research papers, security blogs and conference proceedings – which cannot be processed by traditional means.

The purpose of Watson is to be able to absorb and understand all of that unstructured data so that it can process and respond to unstructured queries. Ultimately a network specialist will be able to query, ‘how do I protect against xyz zero-day exploit?’ or even ‘what are the current zero-day threats?’; and Watson will respond with instructions gleaned and processed from previously ingested research papers and blogs.

It suffers from the same difficulty as all AI projects: how to input a sufficient quantity of correctly labelled data for the subsequent processing to return results with an acceptable probability of accuracy.

IBM’s Caleb Barlow, Vice President IBM Security, explained the process. “We teach it how to read, just like we teach a child to read. Just as we say, this is a noun, this is a verb, this is an adjective, so we teach Watson, this is an attack, this is a victim, this is a target, this is a threat actor, this is malware and so on; and then ‘this is the relationship between those different concepts’.”

This is initially a highly labor-intensive manual process of annotating the individual documents; but by the end of this process, Watson is able to understand and automatically ingest new documents.

There will be mistakes. Some of the documents will provide wrong information – but here the traditional concepts of crowd wisdom and reputation will come into play. Firstly, some sources will acquire a higher reputation, and be given greater credence during processing, than others. But even more importantly, Watson will over time become a single point of total crowd wisdom.

“Our goal,” said Barlow, “is to be in customer trials by the end of the year.” But he doesn’t think of Watson for Security as a traditional product – it is more like a continuous project that will simply get more and more efficient and accurate over time.

Accenture

In February 2016 Accenture introduced its Cyber Intelligence Platform combining elements of artificial intelligence, machine learning and cognitive computing. Vikram Desai, managing director at Accenture Analytics, described the process. “Accenture takes the metadata of computer transactions coming off routers, and analyzes that – just as intelligence agencies have done for years. Using machine learning and artificial intelligence,” he said, “we can determine what are the normal patterns and behaviors of any particular network.”

This creates a baseline of normality within huge volumes. “A typical corporate network might have anything from 40 to 60 billion transactions per day. By understanding what is normal, we can see those behaviors that are not normal. This could mean that it is ‘OK but just unusual’, or it could more likely mean ‘this is not good’.”

The baseline is created through a combination of unsupervised machine learning, then operational analytics followed by supervised learning with the network operators. “Over time,” he explained, “this ensures that no initial malicious activity gets incorporated into the 'normal behavior' baseline.”

The power of the computers can then see and highlight, in real-time, those network behaviors that are anomalous to normal operations. But the Accenture platform doesn’t stop there. “Finally,” he added, “a cognitive element helps with the visual interpretation of the output. We present a visualization of what the attack looks like. A tabular format would not be so easily or rapidly understandable. This helps the analyst to distinguish between ‘unusual but OK’ behaviors and ‘unusual and not OK’ behaviors.”

Google’s Common Sense Project

On June 26, 2016, Google announced Google Research Europe, based in its Zurich offices. It will concentrate on three particular areas: machine intelligence, natural language processing and understanding, and machine perception. What has captured the most attention, however, was research head Emmanuel Mogenet's comment to journalists that a key area would be in teaching computers 'common sense'.

'Commonsense reasoning' is not a new concept in artificial intelligence -- but it is new to hear that Google has a research center dedicated to it. It differs from traditional artificial intelligence in that it seeks to teach a machine to make human-like presumptions and judgments where traditional AI seeks to teach the machine to make human-like decisions. This is no simple task. While IBM's Watson is seeking to ingest all human knowledge, a common sense machine will need to ingest all human understanding.

A recent paper (21 June 2016) produced by Google scientists and Stanford and Berkeley universities examines the problem of 'accidents' coming from artificial intelligence. The paper gives a simple example: "Suppose a designer wants an RL [reinforcement learning] agent to achieve some goal, like moving a box from one side of a room to the other. Sometimes the most effective way to achieve the goal involves doing something unrelated and bad to the rest of the environment, like knocking over a vase of water that is in its path."

Later, it states, "Put differently, objective functions that formalize 'perform task X' may frequently give undesired results, because what the designer really should have formalized is closer to 'perform task X subject to common-sense constraints on the environment,' or perhaps 'perform task X but avoid side effects to the extent possible.' Furthermore, there is reason to expect side effects to be negative on average, since they tend to disrupt the wider environment away from a status quo state that may reflect human preferences."

Solving this problem will require teaching a machine the ability to make predictive judgments on the effects of its actions -- that is, to be able to undertake commonsense reasoning. Google has not yet explained how it will go about developing this capability beyond generalizations. "A four-year-old child learns about the world through their senses so they know that cows don't fly without being told this," said Mogenet. "Computers need to understand some obvious things about the world so we want to build a common-sense database."

Summary

The ultimate potential for self-healing networks is hugely attractive – and the high-level route to achieving this is easily understood. First you teach the system, through supervised and unsupervised machine learning to understand everything that is going on; and then you apply artificial intelligence or cognitive intelligence to make human-style decisions on what to do about any detected situation.

But not everyone is a full convert. Simon Crosby, CTO at Bromium, believes that machine learning is of huge benefit in helping analysts make decisions. But that’s probably as far as it goes. “Machine learning, especially unsupervised machine learning,” he believes, will inevitably generate false positives. Those false positives will have to be analyzed by human technicians.” You will never be able to trust the machine to operate autonomously.

“It is good at helping to detect existing breaches,” he added, “but it is bad at preventing breaches.” And that’s the basis of the problem. It helps to shift focus from preventing a breach to detecting a breach; which means shifting focus from secure design to detection regardless of design. “By definition,” says Crosby, “if you're looking for breach detection you're in trouble – the breach has already happened.”

view counter
Kevin Townsend is a Senior Contributor at SecurityWeek. He has been writing about high tech issues since before the birth of Microsoft. For the last 15 years he has specialized in information security; and has had many thousands of articles published in dozens of different magazines – from The Times and the Financial Times to current and long-gone computer magazines.