Connect with us

Hi, what are you looking for?



Internet Security Threats from a Multilingual Domain Name System

Threats from Internationalized Domain Names (IDNs)

Threats from Internationalized Domain Names (IDNs)

The global Internet is currently undergoing one of its most significant upgrades since the creation of the Web. Due to the introduction of internationalized domain names (IDNs), for the first time hundreds of millions of surfers will be soon able navigate the Internet entirely in their native languages. While these long-awaited improvements are undoubtedly incredibly good news for the billions of people who do not use Latin script in their day-to-day lives, they also raise potential security concerns that every Internet user and application developer should know about.

IDN Domain Threats

The domain name system (which replaced numerical IP addresses with names) was created in the mid-1980s, before the Web was even a twinkle in Tim Berners-Lee’s eye and nobody could have predicted how fundamental a part of our lives it would become. The DNS was unwittingly limited in its design by the use of ASCII in its base specification, RFC 1035 (although there was vigorous discussion supporting non-ASCII characters, including in the “host name” convention discussions); while the DNS allows any binary string to be used as the label of a resource record, most operational DNS systems still only support the 26 letters of the Latin alphabet, the ten numerals and the hyphen. This is fine for most users in the Americas, Australasia, western Europe and parts of Africa. However, of the almost two billion Internet users believed to be online today, it is estimated that up to half of them have first languages that use non-Latin scripts.

After much discussion in the technical community over many years, it was decided that the best way to solve this problem without disrupting critical traffic was to keep the DNS infrastructure an ASCII-only environment and accommodate non-Latin scripts by encoding them into ASCII in end-point applications such as the browser. A protocol known as Punycode was developed as a way to represent non-ASCII characters into DNS-readable ASCII based characters. And now, browsers have evolved so that most current versions are Punycode capable and use Punycode to encode outgoing and decode incoming characters from whatever language the user types into the address bar.

IDNs have been available as second-level domains in many countries for many years, but it is only this year that ICANN has started to approve IDN top-level domains, so that domain names can be represented in local characters both to the left and right of the dot. Countries including China, Japan, Saudi Arabia, Russia, Jordan and Egypt have already had their choices of IDN country-code domain delegated in their respective national scripts. For example, http://وزارة-الأتصالات.مصر is the fully Arabic domain name assigned to Egypt’s Ministry of Communications and Information Technology. Because Arabic is written from right to left the top-level domain (مصر. or “.egypt”) comes first.

If you click the link above, you may find your browser displays this in the address bar: http://xn--4gbrim.xn—-ymcbaaajlc6dj7bxne2c.xn--wgbh1c – that’s the Punycode translation that the DNS uses to search the TLD’s zone file for the nameserver that will contain the IP address location of the website you are trying to reach. Punycode strings are immediately recognizable because they always start with the letters XN and two hyphens (xn–). For example, the domain name café.com, which has the acute accent above the letter E, would be represented in Punycode as How applications choose to present these IDN domain names to their users depends on the application. All browsers in current versions are capable of recognizing IDN domain names in Web content and rendering them clickable by users.

The raison d’etre of the DNS is to allow simplicity and usability in Internet navigation and addressing. While the addition of IDNs will help billions of people achieve in this goal, it also adds a layer of complexity that carries some security risks.

From the outset, it was anticipated that IDNs could exacerbate the phishing threat. In ASCII, it’s already possible to confuse some letters and numbers – I (eye) and l (ell) and 1 (one), or O (oh) and 0 (zero) are obvious examples. looks a lot like But with the number of characters allowable in domain names increasing from 37 into potentially the thousands, the possibility of two strings being visually confusing increases considerably. While there’s little overlap between, for example, Chinese and Latin scripts, it is possible to “spell” certain English words using entirely Greek or Cyrillic characters (and vice versa), for example. These domains could look virtually identical to their English counterparts, but lead to entirely different and potentially malicious web sites.

Advertisement. Scroll to continue reading.

When these visually similar domains are used in phishing or other malicious activity, it’s known as an “IDN homograph attack”. While examples of possible attacks which exploit the confusion in characters either within a script or between scripts were discussed as early as 2001, one of the first proof-of-concept attacks was demonstrated in 2005, when a version of using the Cyrillic equivalent of the letter A was registered and used to direct users to a hacker-controlled website. More recently, this year Microsoft won a cybersquatting claim against the owner of bı, a variation on that uses a Turkish ı in place of the usual Latin i.

internationalized domain names (IDNs)

While IDN homograph attacks are feasible using just one script to mimic another, the potential for phishing grows greater when mixed-script domain names are permitted. Some top-level domain registries, such as Russia’s IDN .рф (.ru), only permit registrations in a single script, in Russia’s case Cyrillic. But with other TLDs, script-mixing appears to be allowed. Many European languages (along with a few English words such as the aforementioned café), use accented Latin characters, which must be represented as IDNs even if the majority of the letters in the domain use vanilla ASCII.

Much as software makers are responsible for enabling IDNs for their users, it’s also incumbent on developers of applications such as browsers and email clients to ensure their implementations reduce the risk of phishing. Mozilla, for example, has a policy of enabling the visual display of IDNs for domains where the registry has published a policy on how it handles homographs. So, for example, café.biz and café.info both display normally, but naï displays as Microsoft’s Internet Explorer takes the route of displaying Punycode strings unless the user’s local language settings correspond to the IDN.

These kinds of moves go some way to protect users of the legacy DNS from the possibility of losing their data or money to phishers, and should be noted by administrators and application developers alike. The multilingual DNS is going to be a powerful force for increasing Internet usage worldwide over the coming years and it should not be perceived simply as a security risk. With careful thought and the appropriate policies, user confusion among Internet users can be minimized, opening up the possibilities for the greater good that is to come by internationalizing the Internet.

(Updated 10/12/10)

Written By

Click to comment


Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Gain valuable insights from industry professionals who will help guide you through the intricacies of industrial cybersecurity.


Join us for an in depth exploration of the critical nature of software and vendor supply chain security issues with a focus on understanding how attacks against identity infrastructure come with major cascading effects.


Expert Insights

Related Content


The changing nature of what we still generally call ransomware will continue through 2023, driven by three primary conditions.


As it evolves, web3 will contain and increase all the security issues of web2 – and perhaps add a few more.

Identity & Access

Zero trust is not a replacement for identity and access management (IAM), but is the extension of IAM principles from people to everyone and...


A recently disclosed vBulletin vulnerability, which had a zero-day status for roughly two days last week, was exploited in a hacker attack targeting the...


Luxury retailer Neiman Marcus Group informed some customers last week that their online accounts had been breached by hackers.


Zendesk is informing customers about a data breach that started with an SMS phishing campaign targeting the company’s employees.

Artificial Intelligence

The release of OpenAI’s ChatGPT in late 2022 has demonstrated the potential of AI for both good and bad.


Satellite TV giant Dish Network confirmed that a recent outage was the result of a cyberattack and admitted that data was stolen.