After a brief respite, the animosity between the incumbent anti-virus vendors and the newcomer machine learning (ML) endpoint protection vendors has returned; and the focus is still on testing.
On Monday this week, Ars Technica published an article with one new element: a test using 48 Cylance-provided malware samples showed 100% detection by Cylance, but somewhat less from competing products. It turned out that nine of the samples were harmless. This “led the engineer [conducting the tests],” wrote Ars, “to believe Cylance was using the test to close the sale by providing files that other products wouldn’t detect — that is, bogus malware only [Cylance] would catch.”
On Tuesday, Cylance’s vice president of product testing and industry relations, Chad Skipper, blogged about the Ars article and the ‘harmless’ samples. He explained that Cylance doesn’t simply use known malware for tests, but alters them via the mpress and vmprotect packers so they effectively become unknown malware. Sometimes, however, the packing doesn’t fully work, and actually renders the original malware harmless. This, he suggests, is closer to the real-life situation faced by end users.
Not all the questions raised by the Ars article are fully explained by Skipper. “Of the nine files in question,” writes Ars, “testing by the customer, by Ars, and by other independent researchers showed that only two actually contained malware.” Skipper responded, “We don’t give empty files on purpose — it’s just not what we do.”
Nevertheless, if seven of the 48 samples were incorrectly detected as malware by Cylance, that’s a pretty high false positive rate of just over 14.5% — a rate that would not have been detected had not the engineer looked more closely at the testing results.
This has led to some suggestions that Cylance is gaming the system. “It’s unbelievable that businesses today can’t trust the people who they rely on to keep them secure,” commented Mike Viscuso, CTO and co-founder of endpoint security firm Carbon Black. “The actions Cylance has taken puts their customers and our national security at risk.”
“Not sure if it can be called cheating,” said Luis Corrons, technical director at PandaLabs, a competitor in the endpoint security space; “anyway it is clear to me that ethics are not an obstacle for Cylance to get new customers. They do not allow testers to do comparative testing of their solution unless they impose their methodology, therefore there is a lack of independent testing to validate their marketing claims, so they ask their prospects to do their own tests, and they give them a preselected set of ‘malware’. He added that if he were to do similar at Panda, “I would be fired.”
Cylance claims that the majority of independent third-party tests are biased in favor of the incumbent vendors that use malware signature databases (as well as other techniques, including their own use of machine learning). Those vendors in turn suggest that some (not all) ML-based vendors seek to bias the testing in their own favor, and threaten law suits if they do not get their own way. The threat became reality earlier this year when CrowdStrike sued testing firm NSS Labs.
One of Skipper’s arguments is that other vendors use the Anti Malware Testing Standards Organization’s (AMTSO) Real Time Threat List (RTTL). This list is largely known by the vendors, and consequently does not provide a genuine test.
While this may be true for some vendors’ own tests, it is not generally true for third-party testing. Lists such as RTTL and the WildList are mostly used for product certification, but not for comparative testing. Independent researcher David Harley explained, “They’re of considerably less use for comparative testing, as the testing industry has always been aware. After all, the point of comparative testing is to differentiate between products. A test restricted to malware which is already known to vendors (or a substantial majority thereof) is not going to show enormous differences.”
This was confirmed by an independent third-party tester who asked not to be named. He described four methods of acquiring malware samples: from a vendor; from VirusTotal; from a third-party source such as a large corporation; and lastly, by monitoring the threat landscape and acquiring threats and attack methods independently. He, and he believes the majority of test labs, use the last method.
“Tests that use malware gathered using the first three approaches could put Cylance at a disadvantage versus vendors that suck in lots of files and generate signatures,” he told SecurityWeek. “But I’m not sure that it’s fair to say that all vendors do that. It seems a bit old-fashioned and error-prone. I also don’t think it makes the tests unfair. It simply highlights the inconvenient fact that there are loads of threats and Cylance’s approach is not perfect because it doesn’t provide full coverage. Sure, it is at a disadvantage — but one of its own making, not because the testing is wrong.”
Harley agrees with this basic viewpoint. “If comparative testing was about the exclusive use of cooperatively verified lists, it would still be more accurate than using samples supplied by a single vendor and containing a high percentage of garbage files.”
John Shaw, VP product management at Sophos, also a player in the next-gen endpoint security market, pointed out that the Cylance arguments against the third-party testing industry could more accurately be aimed at Cylance itself. “The leading testing organizations,” he told SecurityWeek, “are working to improve their ability to test products in more representative ‘real world’ environment, using massively used techniques like infecting legitimate websites, and exploits against legitimate software. To do this at scale is hard and the industry still has a long way to go. Clearly for an individual customer to try and run a statistically significant test that simulates the real world is close to impossible, even with unlimited time.” (Sophos previously published a stinging rebuke against Cylance’s product comparison methods last summer.)
This doesn’t mean it’s impossible to self-test — just very, very hard. “With testing,” said Viscuso, “it’s important to go beyond malware samples and test how the product handles real-world attacks. Malware samples alone are going to demonstrate one thing — how well the product can stop the particular malware samples in your sample set. You’re interested in stopping attacks, not just malware. Real world attackers don’t rely on packed executables. They use documents, PowerShell, Python, Java, built-in OS tools, anything they can leverage to get the job done. To test the solution against real-world attack techniques, use a penetration testing framework such as Metasploit. Construct payloads with Veil-Evasion and use the techniques seen in real attacks. PowerShell Empire is also a great way to build PowerShell command lines and macro-enabled documents that go beyond executable malware samples.”
It should be said that several vendors, including ML-based vendors and test laboratories, declined to comment: the issue is bitter and divisive. From those that did respond to SecurityWeek, the consensus is clear. Almost all agree that comparative third-party testing is difficult, but not impossible. And almost all, but one, agree that in rejecting independent testing, Cylance has replaced it with something far worse and potentially misleading. The exception is NSS Labs. “I don’t think Cylance did anything wrong,” said Vikram Phatak, CEO of NSS Labs. “Their execution appears to have been problematic, but not their approach.”