Security Experts:

Testing Security Products: Third-Party Standards vs. In-House Testing

For the last decade, the Anti-Malware Testing Standards Organization (AMTSO) has owned the endpoint security testing standards space. It has done much good, bringing order and consistency to a complex arena.

Now, in the last two months, AMTSO has been joined by a new network product testing standard (NetSecOPEN), and a new product testing approach (the MITRE ATT&CK matrix).

AMTSO

AMTSO was founded in 2008, “to improve the business conditions related to the development, use, testing and rating of anti-malware products and solutions.” Membership is open to anybody dedicated to this purpose.

It currently has 61 members noted on its website. The vast majority are what may be called 1st-gen anti-malware companies. Many, but a fewer number, of the 2nd-gen (AKA next-gen) endpoint protection companies have also joined. The remaining members largely are product testing labs. There are no end users represented.

The good done by AMTSO over the last ten years should not be underestimated. It brought order to product testing where there was none, and helped eliminate the more excessive marketing hype around product performance.

Its relevance, however, has been sorely tested over the last few years with the arrival of 2nd-gen endpoint protection products that rely on machine-learning detection rather than primarily the blacklist (known bad) prevention employed by 1st-gen products.

NetSecOPEN

NetSecOPEN launched its first standard (for testing the performance of 2nd-gen firewalls) on December 11, 2018, having been founded in May 2017. In many ways it is like AMTSO, but with a focus on network security product performance rather than endpoint security performance. It comprises a full-time executive director (Brian Monkman, out of ICSA Labs); and a board drawn from its members (the chairman is Jurrie van den Breekel, VP business development and product management at Spirent Communications).

NetSecOPEN’s overriding purpose is to do for network testing what AMTSO did for endpoint testing. Monkman told SecurityWeek that in founding NetSecOPEN, he was hoping to bring order to network security performance testing. “Because it really is, right now, the wild west out there. You can have one lab testing performance and network security products and another lab testing performance; and it sounds like they're talking the same language because they're using the same words – but when you peel back the cover, there really isn't an apples-to-apples comparison between the testing that is conducted across labs.”

This is precisely the problem tackled by AMTSO over endpoint products, and largely solved for 1st-gen products.

There is, however, one major difference between AMTSO and NetSecOPEN: the latter is making great attempts to be open. “Ultimately,” said Monkman our goal here is to collaborate with test labs, with security product vendors, with test tool vendors and with enterprises to create an open and transparent set of testing standards that can be used across the industry. We're not going to stop at next gen firewalls, but we're also going to be moving into other areas and other technologies as we move forward.”

Van den Breekel also suggested that NetSecOPEN’s process is different to AMTSO’s process. “AMTSO,” he said, “looks at security efficacy which is a much more nebulous area because there's 101 opinions on how to do that. That's why they don't specify in great detail how you should do it, but more the process of how you should go about it.”

It’s worth briefly mentioning that network testing is inherently simpler than endpoint testing. For a network you can place a tap on the wire, examine every byte as it flows through, and draw a conclusion on what you see. Endpoints, however, are complex computers with many different things happening in parallel. You cannot just look at one of the processes and come to a conclusion on malicious or benign.

“In NetSecOPEN,” continued van den Breekel, “we're solving a more practical problem than AMTSO. Almost every enterprise knows that the efficacy ratings provided by AMTSO tests are not what they're going to get when they start using the product. In NSO we're trying to certify a set of results that come a whole lot closer to what one may expect in a real network – and that makes the RFP process, trying to level the playing field a little bit between all the vendors, a lot easier. It's not uncommon for the publicized NGF through-put number to be 5 times higher on the data sheet than it will actually perform in your network with all the security features turned on. But with NetSecOPEN, you have more accurate data upfront and it helps guide what additional testing you may want to do yourself.”

Problems with the AMTSO and NetSecOPEN approach

Whatever you think of AMTSO and NetSecOPEN, they are and are likely to remain fundamentally closed industry groups. Both claim to have open membership, and while this may be true, it is primarily the security vendors that have the incentive to join and influence. The tests and testing methodology will be geared towards the existing membership.

Since early AMTSO membership was dominated, and is still largely dominated, by 1st-gen anti-malware vendors, it is understandable and inevitable that testing methodologies revolved around testing 1st-gen capabilities.

But technology changes and evolves. It has already done so for endpoint protection systems, and will inevitably, eventually, do so for network products. The problems that this evolution can cause for testing standards has already been seen with the emergence of endpoint protection systems that rely almost entirely on machine learning detection.

AMTSO tests were not geared for this type of product and, allegedly, produced results favoring the membership majority of 1st gen vendors. The newcomers reacted aggressively, over-hyping their own products, denigrating the old guard, and sometimes operating their own tests. AMTSO has attempted to address their grievances, and the endpoint testing market is certainly calmer – but it would be wrong to assume that 2nd-gen vendors are entirely happy.

Scott Lundgren, CTO at Carbon Black (itself a member of AMTSO), explains: “Since AMTSO is an industry group, it is in effect the industry's attempt to self-regulate and self-police. In my view, the critical question here is, ‘is the industry capable of doing this?’ From a personal and philosophical standpoint, my opinion is, ‘No’. That’s not limited to endpoints or even security, but self-policing and self-regulating doesn’t generally work. In this specific, the fundamental question is, ‘does the industry group end up tilting towards protecting its own?’; and my perspective would be ‘Yes, yes it does’.”

This same problem and same criticism will inevitably apply to NetSecOPEN as soon as new vendors appear with a technologically new approach to the problem.

MITRE ATT&CK

MITRE ATT&CK offers an alternative approach to evaluating endpoint protection products. It was not originally designed for comparative evaluations, but that is perhaps a natural extension of its capabilities. 

The name is an anagram for Adversarial Tactics, Techniques and Common Knowledge. It is a knowledgebase of attack methods used by different adversary groups. At its highest level it is a matrix of the individual different adversary techniques, from initial access to data exfiltration, providing access to more detailed information.

For example, the matrix cell ‘PowerShell’ under ‘Execution’ links to a list of threat groups (such as APT3) that are known to employ, or have employed, PowerShell during an attack. Each threat group name then links to further information on that group, including the way in which the threat group has used the technique. For example, “APT3 has used PowerShell on victim systems to download and run payloads after exploitation.[9]” The reference provides a link to the source of the information, which in this case is the FireEye report titled ‘Operation Double Tap’.

Any enterprise can use this matrix to evaluate the extent to which its security is able to detect and perhaps block the different tactics and techniques used by different attackers. A potential weakness is that while it can help enterprises recognize weaknesses in their security posture, it does not help them choose a product to solve the weakness.

Perhaps partly for this reason, MITRE ATT&CK decided to evaluate security products against the matrix to help enterprises select the best product for their purposes. The route chosen by MITRE ATT&CK was to select a single threat actor, use the matrix to provide an emulation of that actor’s tactics and techniques, and then evaluate the performance of multiple endpoint protection products against that emulation.

“MITRE ATT&CK serves as an effective way to describe adversary behavior using a common language,” Blake Strom, ATT&CK lead, explained to SecurityWeek. “That language can be used to derive test behaviors and relate them to defensive capabilities. This is less about the type of product and how it works under the hood, and more about what threat behaviors the capability addresses. We have started our evaluations with a focus on the enterprise network, post-exploit detection. Preventions, protections, responses, deception, and many other technology types can also benefit from third party evaluation of how well they address the techniques described in ATT&CK.”

APT3 was chosen as the aggressor template, and vendors were invited to take part. “Companies pay a fee to MITRE for the evaluation, understanding that all results will be publicly released,” explained Frank Duff, lead engineer for the ATT&CK evaluations. “This program supports MITRE’s mission of providing objective insight and improving the cyber community’s overall security posture.”

Seven vendors (Carbon Black, CrowdStrike, CounterTack, Endgame, Microsoft, RSA, and SentinelOne) took part in the first evaluation this year. The results were announced on November 29, 2018. These results are not like traditional tests: they do not allow the vendors to claim, ‘we stop 99.9% of all known badnesses’. Instead, the raw results are posted on a MITRE website unadulterated and uninterpreted. It merely says this product detected this aggressor technique in this way – or not.

The process takes 2 or 3 days, following a 2-week set-up period in which each vendor installs its product in identical Azure environments. The vendor and MITRE sit down together to work their way through the emulation.

Potential problems with the MITRE ATT&CK methodology

The MITRE ATT&CK process is not perfect, and there are two specific problems. The first is that it can be ‘gamed’ by unscrupulous vendors. This can happen – consider the Qihoo and Tencent incidents with AMTSO-style tests.

With MITRE ATT&CK there are two possible methods. The first is to tweak the product ahead of the test. In this first test, it was pre-announced that the subject would be an APT3 emulation. Since the MITRE ATT&CK matrix is freely available to everyone, vendors could pre-generate their own emulation and make sure their product detects everything necessary.

Over time, this could be prevented by MITRE developing multiple emulations and not pre-announcing which one will be used. If the vendor then used the matrix to ensure it could detect everything, then MITRE will have succeeded, and the vendor’s product will be improved.

The second method of gaming the system is more intransigent. These are cloud-based detection engines. Most vendors also have a managed service offering – which means they have a team of highly skilled experts watching over the detection engine in the cloud. It is feasible that this team could monitor what was happening in the evaluation and tweak the engine to ensure detection in almost real-time.

If this were done, there would be no way for MITRE to recognize it – and the result would be a test of which vendor has the best team of analysts rather than a test of the capabilities of the actual product.

“I believe this is a fundamental weakness in the approach,” Scott Lundgren, CTO at Carbon Black (one of the tested vendors) told SecurityWeek. “It is something that will have to be addressed and accounted for – but there's no immediately obvious way to do so.”

The second problem with the MITRE ATT&CK methodology is common to all existing testing standards, including AMTSO and NetSecOPEN. They are all run by third-parties using third-party rules in a laboratory environment. This not what most corporate users want. Corporate users want to be able to test the products themselves to know that they are suitable for their own unique IT and security environments.

Here Lundgren has his own suggestion. It involves combining use of the MITRE ATT&CK matrix with two additional open source projects: Atomic Red Team and OSquery. 

Bringing testing in-house

The value of third-party testing/evaluation is high for vendors (who use the results for marketing purposes), but low for corporate users (who need to see how each product works in their own unique environment). This is difficult. Security teams do not always know where their weaknesses lie – which is the argument behind pentesting and red teaming.

The MITRE ATT&CK reference framework shows the individual techniques used by different attackers across the kill chain. Bringing detailed testing in-house would require two additional factors: the ability to test detection of these attacker techniques; and, if detection fails, the ability to locate what – in the entirety of the security set-up – is failing.

Two additional open source projects can help: Atomic Red Team and OSquery.

Atomic Red Team

Atomic Red Team is an open source project that develops ‘micro’ (or ‘atomic’) tests that map onto the MITRE ATT&CK reference matrix. It is maintained by Red Canary, an MSP out of Denver, Colorado, and housed on GitHub. Each test is small, easy to use, and fast in operation. Since they map to the MITRE ATT&CK matrix, each test can be run in-house in the knowledge that is relevant to existing attacker techniques.

Because they are easy to use, existing security teams can run them without needing to wait for a specialist red team or expensive external pentester. Because they are small and fast in operation, tests can be chosen at a moment’s notice either randomly, in response to a currently active attacker group, or in some form of structured sequence. In the last case, it brings the enterprise close to the coveted concept of continuous pentesting.

This process of self-testing against the MITRE ATT&CK matrix immediately offers potential improvements on third-party testing. While it doesn’t test for artefacts (such as malicious files), it does test for the existence of techniques that that might indicate an intruder, and/or highlights where a specific attacker tactic is not being detected. Third-party testing might indicate the potential performance of a security product in laboratory conditions, but the MITRE ATT&CK and Atomic Red Team combination will provide actual detection performance against each enterprise’s overall security implementation.

This means that potential security purchases can be brought in for meaningful examination, rather than just to see how it runs; and existing products can be improved through demonstrable feedback to the vendor.

There will be occasions where detection fails, but the tester isn’t sure where or what fails, nor what can be done to improve things. “Atomic Red Team provides a testing framework that maps onto ATT&CK,” explains Carbon Black’s Lundgren. “But if the tests show that the entirety of your security isn't working, where did it break? Troubleshooting that becomes complicated.”

What would help, he adds, “is a set of reference detections. There are professionals doing this already, but for now their work is scattered. There is no central repository where such reference detections can be found.”

Lundgren’s own proposal is to bring OSquery to the party. It should be said that this hasn’t been done yet; but there is no reason it cannot be, nor any reason that individual enterprises cannot make some use of the idea.

OSquery

OSquery is an open source project originating from Facebook that effectively exposes multiple operating systems (Windows, MacOS, Linux and FreeBSD) as a high-performance relational database. This allows the user to write SQL queries to explore the operating system across the entire endpoint fleet simultaneously.

“Think of it as an endpoint telemetry agent with an SQL interface,” explained Lundgren. “You can say, ‘Show me a list of all running processes across my entire fleet’, or all my open sockets, and join those tables arbitrarily.”

The OSquery project fulfils the requirements to provide a central repository of reference detections. It is open source, extensible and freely available. It can be used with the Atomic Red Team detections mapped against the MITRE ATT&CK matrix of attacker tactics and techniques.

The future for product testing

While this has obvious potential for finding weaknesses and improving the overall enterprise security posture, it can also be used to ‘test’ potential third-party products in-house and in-situ. Furthermore, it has the potential to allow existing product customers to provide very detailed feedback to the vendor on how a product needs to be enhanced or improved for specific detections. Both vendor and customer will benefit.

SMBs might never have the resources to adopt such an approach to security product testing, so the benefit of third-party product testing – despite its limitations – will remain. It may be that some of the existing test laboratory’s will add the MITRE ATT&CK matrix approach to their own repertoire, or new companies will offer it as a service. Larger organizations, however, now have the opportunity to undertake serious product testing in-house. 

“Having open source testing that can also be done by the enterprise is the essential way forward for the industry,” says Lundgren.

Underlying it all is the MITRE ATT&CK matrix – and there is much more to come from MITRE.

“MITRE will continue to refine and expand ATT&CK as needed to ensure it is representative of current threats,” said Strom. “Several specific efforts underway include expansion beyond data exfiltration to include destructive attacks against enterprise networks, refining how we describe variations of individual techniques, and covering adversary behaviors against network infrastructure devices. The feedback we receive from organizations using ATT&CK and those who are responding to incidents help us make it a better resource for the broader security community.”

view counter
Kevin Townsend is a Senior Contributor at SecurityWeek. He has been writing about high tech issues since before the birth of Microsoft. For the last 15 years he has specialized in information security; and has had many thousands of articles published in dozens of different magazines – from The Times and the Financial Times to current and long-gone computer magazines.