Attackers Explore How to Defeat Machine Learning (ML)-Based Defenses and Use ML in Their Own Attacks
Artificial intelligence (AI), or more usually machine learning (ML), is the new kid on the block. It has become de rigueur for any new product or new version of an existing product to tout its AI/ML credentials. But the technology that was originally sold as the answer to cybercrime is now being questioned: is it a silver bullet or just a temporary advantage to the defenders?
Adam Kujawa, director of Malwarebytes Labs, has been considering the potential for bad actors to usurp machine learning for their own purposes. His report looks at some of the methods by which cybercriminals could use ML as offensive weapons against industry, focusing on three areas: poisoning defenders’ ML algorithms, DeepFakes, and artificially intelligent malware.
There are two fundamentals to machine learning: the algorithms that teach the machine how and what to learn, and a large amount of data (big data) to learn from. And there are two fundamental methodologies: unsupervised learning (which is effectively pure AI, where the machine teaches itself without reference to direct human intervention), and supervised learning (where the learning process is guided by human experts).
Algorithms are improving all the time, but suffer from one major weakness: they are subject to the conscious or subconscious bias of the designer. This is a bigger problem outside of cyber — where ML decisions can have life or death, freedom or prison implications — but will nevertheless be apparent in cybersecurity applications. Unsupervised machine learning will more rapidly evolve along the designers’ prejudices.
Poisoned data is a bigger problem; and again, unsupervised ML will respond to the poison faster. The danger has already been shown in the ML-based Twitter chatbot (Tay) developed and rapidly withdrawn by Microsoft. “A Twitter bot based on unsupervised machine learning,” says Kujawa, “had to be taken offline rather quickly when it started imitating unbecoming behavior that it ‘learned’ from other Twitter users. This was almost a perfect showcase of how easily machine learning can be corrupted when left without human supervision.”
The problem with all AI is that it cannot understand social context. Twitter is replete with bad language, extreme views, hate and false news that to humans (unless we already share those views) are easily recognizable. This is a data pool, already poisoned by its very nature, from which Microsoft’s bot learned bad behavior as normal. “‘Tay’ went from ‘humans are super cool’ to full nazi in <24 hrs,” commented @geraldmellor.
The same principal can be used by cybercriminals to subvert the data pool used by products to learn the patterns of suspect behavior. “Threat actors,” warns Kujawa, “could also dirty the sample for machine learning, flagging legitimate packages as malware, and training the platform to churn out false positives.” The higher the concentration of false positives, the greater the likelihood for the security team to ignore alerts in their triaging process.
A second concern for Kujawa is the use of ML in social engineering. Deep, automated AI-enhanced social media scanning could rapidly build profiles of targets and their employees for compelling spear-phishing campaigns. But perhaps the most newsworthy current development is the evolution of what is called the DeepFake video.
DeepFake videos can be generated by using AI to match a target’s facial imagery to words spoken by a voice imitator. In the future, the voice itself might also be generated by AI. A recent example appeared to show Mark Zuckerberg delivering a deeply cynical message. “”Imagine this for a second,” he appears to say: “One man, with total control of billions of people’s stolen data, all their secrets, their lives, their futures. I owe it all to Spectre. Spectre showed me that whoever controls the data, controls the future.”
One danger is that this technology could be married to BEC attacks — already a phenomenally successful and attractive attack for criminals (the latest FBI report says that $1.3 billion was lost through BEC and EAC attacks during 2018). “Now imagine getting a video call from your boss telling you she needs you to wire cash to an account for a business trip that the company will later reimburse,” says Kujawa.
The latest Verizon DBIR strongly suggests that criminals are moving to the well-trusted and easier methods of earning their income. The marriage of ML and social engineering is less a possibility than an inevitability.
The evolution of ML-enhanced malware is also inevitable. For example, he writes, “Imagine worms that are capable of avoiding detection by learning from each detection event. If such a family of worms is able to figure out what got them detected, they will avoid that behavior or characteristic in the next infection attempt.”
IBM has already shown the potential with its DeepLocker project. DeepLocker was a research project to examine what could be done with AI-enhanced malware. “DeepLocker,” IBM told SecurityWeek, “uses AI to hide any malicious payload invisibly within a benign, popular application — for example, any popular web conferencing application. With DeepLocker we can embed a malicious payload and hide it within the videoconferencing application. Using AI,” it added, “the conditions to unlock the malicious behavior will be almost impossible to reverse engineer.” The result is malware that is completely invisible until it detects its precise target, at which point it detonates.
Machine learning is out of the bag. It currently gives the advantage to the defenders — but this is primarily because they are the latest users. This may not always be the case. There is an old maxim: developing better security creates better attackers. Attackers are already exploring how to defeat ML-based defenses, and use ML in their own attacks. Defenders must recognize that ML was never a silver bullet, but merely the temporary advantage in the never-ending game of leapfrog between defense and attack.
Despite this, Kujawa is not ultimately pessimistic. The nature of ML used by attackers is that it must necessarily be largely unsupervised in operation. This is its weakness. “Our advantage over AI continues to be the sophistication of human thought patterns and creativity,” he states; “therefore, human-powered intelligence paired with AI and other technologies will still win out over systems or attacks that rely on AI alone.”
Related: It’s Time For Machine Learning to Prove Its Own Hype
Related: Things to Consider Before Incorporating Machine Learning into Your Security
Related: Hunting the Snark with Machine Learning, AI, and Cognitive Computing
Related: Demystifying Machine Learning: Turning the Buzzword Into Benefits