The Georgia Institute of Technology, Georgia Tech, has been awarded a $17.3 million contract to develop a scientific method for cyber attack attribution.
Dubbed Rhamnousia after the ancient Greek spirit of divine retribution, the project will use machine learning (ML) technology to discover the groups behind different cyber-attacks.
“We owe it to the people of this country to objectively reason about the actors attacking systems, stealing intellectual property and tampering with our data,” said Manos Antonakakis, an assistant professor in Georgia Tech’s School of Electrical and Computer Engineering. “We want to take away the potential deniability that these attack groups now have.”
Michael Farrell, chief scientist of the Cyber Technology and Information Security Laboratory, added “Deterrence is virtually impossible if you’re unable to identify the adversary. Attribution is the linchpin for deterrence in cyberspace, and the U.S. government is in need of a repeatable and releasable way forward.”
But attribution is a contentious issue with major ramifications. Opinions range from impossible through partially possible to absolutely possible. Incorrect attribution could cause major international incidents, and could hasten the formal recognition of ‘cyber war’. But the likely outcome of accurate attribution – with its inherent threat of cyber, economic or military sanctions – would be a deterrent effect on potential state-sponsored aggression.
Is accurate attribution possible?
Luis Corrons, technical director at PandaLabs, believes it is. “There are thousands of flags that can be taken into account from a single file, not to mention when we feed into the system all the other information we have about an attack.” Corrons believes that ML algorithms will provide the ability to ‘fingerprint’ attackers.
Ely Kahn, co-founder of Sqrrl and former Director of Cybersecurity at the White House agrees. He believes attribution can be achieved by any combination of three methods: offensively (“Compromise a staging or C2 box and watch the traffic on it and where it goes – only the US government can legally do this in the US”); attacker error (“Sometimes an attacker mistakenly leaves a trail that can be traced”); and probabilistically (“Examine the attacker code or TTPs and look for patterns and/or markers that can be aligned to attacker groups with some degree of certainty”). It is the last approach that will be the cornerstone of the Georgia Tech project.
Morey Haber, VP of technology at BeyondTrust, is another supporter. “Attribution is statistically a realistic possibility by using code samples and methods of attack to determine the source and owner of the attack,” he suggests.
But Ilia Kolochenko, CEO at High-Tech Bridge is not so sure. “Cyber attribution is great idea, however I doubt that it’s achievable with $17 million. Several billions were invested into hundreds of cybersecurity startups this year, but we still continually fail to identify cybercriminals.”
“Attribution of threat actors on the internet is a difficult task,” warns Brian Bartholomew, a senior security researcher at Kaspersky Lab. There are “many issues that make this something that cannot be relied on at an automated level.”
In general, few cyber security experts believe that attribution is impossible; but many feel that it is ultimately unreliable. Noticeably, two separate researchers (David Harley from ESET and Sean Sullivan from F-Secure) suggest that attribution is as much an art as a science.
“Attribution in the cyber realm is as much art as it is science,” said Sullivan. “Depending on the campaign and the clues available, you can often make general conclusions. But there’s no science to it. Your analysis could always be wrong due to lack of additional evidence. I would not depend on attribution conclusions like I would on physics.”
“Complex relationships between individual samples make the classification of malware into families and variants more of an art than a science,” commented Harley.
It is also noticeable that confidence diminishes when retribution is linked to attribution. “While it may be scientifically possible to increase the likelihood that a trained machine can determine common patterns,” said Scott Fulton, a technical fellow with BeyondTrust, “it is legally going to be difficult because there is no direct proof and it might not hold up in a court of law.”
The accuracy of machine learning
Machine learning output does not deal in binary yes/no proofs – it delivers probability scores. It develops those scores through the interaction of algorithms and data, where the algorithms look for patterns and relationships within the data, and the machine learns from the results in an iterative process.
The efficiency of the process is dependent on the quality of the algorithms, while the accuracy of the output is dependent on the accuracy of the data it learns from. Both are subject to human involvement and human error. Indeed, it is generally accepted that the algorithms themselves are potentially subject to the subconscious bias of the developer. More concerning, however, is that if the data is wrong, the output will be wrong.
“This is one of the most critical steps in the process,” says Corrons. “If the data used to create the model is wrong, the predictions it will make
won’t be reliable.”
“Machine learning is usually only as good as the human who designs the algorithms and selects training data sets,” warns High-Tech’s Kolochenko. But he warns that it may not be possible to have absolute faith in the data. “Moreover,” he adds, “professional Black Hats already use machine learning and big data in their activities to create sophisticated deception or smoke-screen systems.”
BeyondTrust’s Haber raises the problem of “new threat actors that have never been seen before. That would create new entries for future correlation;” but would provide no current positive statistical match.
“The biggest issue I see in this from the start is the source of the data to feed the system,” comments Kaspersky’s Bartholomew. “There are many vendors attributing attacks to many different groups, and in most cases, those groups do not align one-to-one. What one place calls Actor X may end up being what another organization calls Actor Y and Actor Z. Handling this grouping will be very difficult, unless they adopt definitions from one source (ie, the government). But if they go this route, then you have the issue of limiting your scope to one source and we go round and round.”
Mike Anders, cyber intelligence investigator at Shadow Blade Technologies, is less concerned because ‘intelligence’ is never 100% accurate anyway. “In all Intelligence work probabilities are always less than 100% until after the fact, and even then they may still be wrong! Waiting for 100% attribution for anything is too often used as an excuse for inaction. The lack of complete intelligence will always be used to that end. Being able to make a decision even with less than 100% confidence and with the understanding that it might all be based on misinformation or inadequate data separates leaders and decision-makers who are good at what they do from those who are just posers and chair warmers!”
Misinformation, misdirection and false flags
Several experts are concerned about different actors deliberately misleading the attribution engine.
“There are many actors out there actively using deception tactics to point fingers at someone else or confuse investigators,” warns Bartholomew. “It’s not impossible to distinguish false flag operations in all cases,” he added; “however, there are some actors out there that are very good at making purposely placed bread crumbs look legitimate. This trend is becoming more and more popular in recent years and I think it’s only going to get worse.”
“There are deliberate false flag attacks aimed at encouraging misattribution,” adds Harley, “and attempts to mislead in other respects – programmatic detail, altered timestamps and so on.”
“Black Hats can easily compromise several FBI machines via dozens of VPNs in different countries and conduct targeted attacks from the FBI’s IPs,” warns Kolochenko. “Such cases are technically and politically uninvestigable. We can obviously guess who is behind the attack, but we won’t get solid technical evidence, except if attackers make a mistake and expose themselves somehow.”
Corrons believes a good attribution engine will make false flag operations more difficult, but not impossible, “unless the attackers know the model used to do the attribution. For example, here the DoD will have access to all the information; so they could make false flag operations that could fool the system – for example to convince a President that nation X has launched an attack against nation.”
One possible effect from trusted attribution could be the inevitability of actual cyber war. If a damaging cyber-attack is pinned to a specific foreign nation, the victim government will be increasingly compelled to openly respond.
“This is the main problem here,” suggests Bartholomew. “While creating new technology to help the process of attribution is great, we need to be sure to not rely on this for ‘proof’; and simply use it as another tool in the toolbox to contribute to the intelligence.”
Corrons takes the view that cyber war is inevitable, but that accurate attribution might make it less so. This is the deterrent effect of attribution. “Cyberwar is inevitable… it is inevitable because it is very cheap and attribution is difficult. ‘Solving’ attribution, even if just partially, could make nations think twice before perpetrating an attack (to avoid the risk of being uncovered).”
Sqrrl’s Kahn points out that retaliation need neither automatically be cyber, nor automatically ‘war’. “If a cyber-attack causes significant damage in the US, the US government will consider all available options to deliver a proportionate response – which could include diplomatic, kinetic, and/or cyber actions (both covert and overt).”
And Mike Anders points out that a government would never base such a decision on just one source of intelligence. “Cyber war is a choice,” he says. “Like all decisions about war, it is never based on one thing or even one event. Or at least, it should not be. You need to remember: data when processed produces information and information when analyzed produces intelligence.
“Attribution,” he continued, “is a component of intelligence production even in cyber. Attribution is more than an artifact but it is not the end of the process. Going to war requires an all-source appreciation of the intelligence and not just what comes out of a Black Box. That is why the human cyber analyst is crucial to the decision-making process. Robots can kill, but you still want a human in the loop. Why would that be different in cyber intelligence analysis?”
Should business worry about attribution?
The consensus is that accurate automated attribution s
hould be possible to a degree; but it cannot be guaranteed 100%. There will always be a danger of false input and weak analysis that will need to be considered. Given these riders, should business (as opposed to government) even be interested in attribution? Luis Corrons thinks it should, but often isn’t.
Anders has no doubts. “Business should care,” he says, “but the government needs to step in and help by stating clearly, categorically, and unequivocally what can be done legally by a commercial enterprise under attack with respect to ‘Active Defense in Cyber’. Acknowledging specifically that Active Defense is a range of options and is not synonymous with ‘Hacking Back’ or similar popular notions about the conduct of offensive as opposed to strictly defensive operations in cyber. That will require action by the Department of Justice, the Federal Bureau of Investigation, and Congress. After which, you are likely to see less chin-music about needing 100% attribution!
“Knowing the truth does set one free even if it is hard to get at. Attribution is a means to an end and having more of it is good and not having 100% of it is in the very nature of the thing called Cyber! Sometimes you just gotta suck it up!”