Security Experts:

Internet Security Threats from a Multilingual Domain Name System

Threats from Internationalized Domain Names (IDNs)

The global Internet is currently undergoing one of its most significant upgrades since the creation of the Web. Due to the introduction of internationalized domain names (IDNs), for the first time hundreds of millions of surfers will be soon able navigate the Internet entirely in their native languages. While these long-awaited improvements are undoubtedly incredibly good news for the billions of people who do not use Latin script in their day-to-day lives, they also raise potential security concerns that every Internet user and application developer should know about.

IDN Domain Threats

The domain name system (which replaced numerical IP addresses with names) was created in the mid-1980s, before the Web was even a twinkle in Tim Berners-Lee's eye and nobody could have predicted how fundamental a part of our lives it would become. The DNS was unwittingly limited in its design by the use of ASCII in its base specification, RFC 1035 (although there was vigorous discussion supporting non-ASCII characters, including in the “host name” convention discussions); while the DNS allows any binary string to be used as the label of a resource record, most operational DNS systems still only support the 26 letters of the Latin alphabet, the ten numerals and the hyphen. This is fine for most users in the Americas, Australasia, western Europe and parts of Africa. However, of the almost two billion Internet users believed to be online today, it is estimated that up to half of them have first languages that use non-Latin scripts.

After much discussion in the technical community over many years, it was decided that the best way to solve this problem without disrupting critical traffic was to keep the DNS infrastructure an ASCII-only environment and accommodate non-Latin scripts by encoding them into ASCII in end-point applications such as the browser. A protocol known as Punycode was developed as a way to represent non-ASCII characters into DNS-readable ASCII based characters. And now, browsers have evolved so that most current versions are Punycode capable and use Punycode to encode outgoing and decode incoming characters from whatever language the user types into the address bar.

IDNs have been available as second-level domains in many countries for many years, but it is only this year that ICANN has started to approve IDN top-level domains, so that domain names can be represented in local characters both to the left and right of the dot. Countries including China, Japan, Saudi Arabia, Russia, Jordan and Egypt have already had their choices of IDN country-code domain delegated in their respective national scripts. For example, http://وزارة-الأتصالات.مصر is the fully Arabic domain name assigned to Egypt's Ministry of Communications and Information Technology. Because Arabic is written from right to left the top-level domain (مصر. or ".egypt") comes first.

If you click the link above, you may find your browser displays this in the address bar: http://xn--4gbrim.xn----ymcbaaajlc6dj7bxne2c.xn--wgbh1c – that's the Punycode translation that the DNS uses to search the TLD’s zone file for the nameserver that will contain the IP address location of the website you are trying to reach. Punycode strings are immediately recognizable because they always start with the letters XN and two hyphens (xn--). For example, the domain name café.com, which has the acute accent above the letter E, would be represented in Punycode as xn--caf-dma.com. How applications choose to present these IDN domain names to their users depends on the application. All browsers in current versions are capable of recognizing IDN domain names in Web content and rendering them clickable by users.

The raison d'etre of the DNS is to allow simplicity and usability in Internet navigation and addressing. While the addition of IDNs will help billions of people achieve in this goal, it also adds a layer of complexity that carries some security risks.

From the outset, it was anticipated that IDNs could exacerbate the phishing threat. In ASCII, it's already possible to confuse some letters and numbers – I (eye) and l (ell) and 1 (one), or O (oh) and 0 (zero) are obvious examples. Paypal.com looks a lot like Paypa1.com. But with the number of characters allowable in domain names increasing from 37 into potentially the thousands, the possibility of two strings being visually confusing increases considerably. While there's little overlap between, for example, Chinese and Latin scripts, it is possible to "spell" certain English words using entirely Greek or Cyrillic characters (and vice versa), for example. These domains could look virtually identical to their English counterparts, but lead to entirely different and potentially malicious web sites.

When these visually similar domains are used in phishing or other malicious activity, it's known as an "IDN homograph attack". While examples of possible attacks which exploit the confusion in characters either within a script or between scripts were discussed as early as 2001, one of the first proof-of-concept attacks was demonstrated in 2005, when a version of Paypal.com using the Cyrillic equivalent of the letter A was registered and used to direct users to a hacker-controlled website. More recently, this year Microsoft won a cybersquatting claim against the owner of bıng.com, a variation on bing.com that uses a Turkish ı in place of the usual Latin i.

internationalized domain names (IDNs)

While IDN homograph attacks are feasible using just one script to mimic another, the potential for phishing grows greater when mixed-script domain names are permitted. Some top-level domain registries, such as Russia's IDN .рф (.ru), only permit registrations in a single script, in Russia’s case Cyrillic. But with other TLDs, script-mixing appears to be allowed. Many European languages (along with a few English words such as the aforementioned café), use accented Latin characters, which must be represented as IDNs even if the majority of the letters in the domain use vanilla ASCII.

Much as software makers are responsible for enabling IDNs for their users, it's also incumbent on developers of applications such as browsers and email clients to ensure their implementations reduce the risk of phishing. Mozilla, for example, has a policy of enabling the visual display of IDNs for domains where the registry has published a policy on how it handles homographs. So, for example, café.biz and café.info both display normally, but naïve.com displays as xn--nave-6pa.com. Microsoft's Internet Explorer takes the route of displaying Punycode strings unless the user's local language settings correspond to the IDN.

These kinds of moves go some way to protect users of the legacy DNS from the possibility of losing their data or money to phishers, and should be noted by administrators and application developers alike. The multilingual DNS is going to be a powerful force for increasing Internet usage worldwide over the coming years and it should not be perceived simply as a security risk. With careful thought and the appropriate policies, user confusion among Internet users can be minimized, opening up the possibilities for the greater good that is to come by internationalizing the Internet.

(Updated 10/12/10)

view counter
Ram Mohan is the Executive Vice President and Chief Technology Officer at Afilias, a global provider of Internet infrastructure services including domain name registry and DNS solutions. Ram also serves as the Security & Stability Advisory Committee's liaison to ICANN’s Board of Directors and has helped direct and write numerous policies effecting domain name registration and DNS security.