Security Experts:

'Trojan Source' Attack Abuses Unicode to Inject Vulnerabilities Into Code

Researchers from the University of Cambridge have identified a new attack method that abuses Unicode to stealthily inject vulnerabilities into code.

Dubbed Trojan Source, the attack impacts many of the compilers, interpreters, code editors, and code repository frontend services used by software developers.

Unicode is the standard for the consistent encoding, representation, and handling of text in most of the world’s writing systems. Some languages are written left-to-right (e.g. English) and others right-to-left (e.g. Hebrew and Arabic). However, Unicode provides a feature — named the Bidirectional (Bidi) Algorithm — for when two types of writing need to be mixed. For example, writing a single word from a right-to-left language in a sentence written in a left-to-right language.

The Cambridge researchers discovered that Bidi can be abused to create code that would be displayed one way in code editors, but be interpreted differently by the compiler. Threat actors could leverage this method to submit malicious code to widely used open source software — the individual reviewing the code might see what appears to be harmless code that in reality introduces a vulnerability.

 Trojan Source attack

Trojan Source attack

Trojan Source attack

A comment is displayed as if it were code

Another variant of the attack leverages homoglyphs, characters that are visually nearly identical. An attacker could exploit this method to define a homoglyph function in an upstream package that can be called from the targeted code.

“Attacks on source code are both extremely attractive and highly valuable to motivated adversaries, as maliciously inserted backdoors can be incorporated into signed code that persists in the wild for long periods of time. Moreover, if backdoors are inserted into open source software components that are included downstream by many other applications, the blast radius of such an attack can be very large,” the researchers explained in a paper describing their work.

They added, “Trojan-Source attacks introduce the possibility of inserting such vulnerabilities into source code invisibly, thus completely circumventing the current principal control against them, namely human source code review. This can make backdoors harder to detect and their insertion easier for adversaries to perform.”

C, C++, C#, JavaScript, Java, Rust, Go, and Python have been found to be impacted. The CVE identifiers CVE-2021-42574 and CVE-2021-42694 have been assigned to the vulnerabilities uncovered during this research.

The researchers also conducted tests on Windows, macOS and Linux to evaluate widely used code editors such as VS Code, Atom, SublimeText, Notepad, vim and emacs, as well as web-based services such as GitHub and BitBucket. Each of them is affected by at least one variation of the Trojan Source attack.

The Cambridge researchers have scanned publicly available source code in an effort to find signs that the Trojan Source attack has been exploited for malicious purposes. While there is no evidence of Trojan Source attacks in the wild, they did see similar techniques being exploited, although many were not necessarily malicious.

The researchers have provided recommendations for preventing abuse and said half of the compiler maintainers they have contacted are either working on patches or have promised to do so.

Rust developers have already released an update that should prevent attacks, and they have published an advisory describing impact.

Related: SSID Stripping - New Method for Tricking Users Into Connecting to Rogue APs

view counter
Eduard Kovacs (@EduardKovacs) is a contributing editor at SecurityWeek. He worked as a high school IT teacher for two years before starting a career in journalism as Softpedia’s security news reporter. Eduard holds a bachelor’s degree in industrial informatics and a master’s degree in computer techniques applied in electrical engineering.