Cybercrime

Internet Security Threats from a Multilingual Domain Name System

Threats from Internationalized Domain Names (IDNs)

October 6, 2010

Threats from Internationalized Domain Names (IDNs)

The global Internet is currently undergoing one of its most significant upgrades since the creation of the Web. Due to the introduction of internationalized domain names (IDNs), for the first time hundreds of millions of surfers will be soon able navigate the Internet entirely in their native languages. While these long-awaited improvements are undoubtedly incredibly good news for the billions of people who do not use Latin script in their day-to-day lives, they also raise potential security concerns that every Internet user and application developer should know about.

IDN Domain Threats

The domain name system (which replaced numerical IP addresses with names) was created in the mid-1980s, before the Web was even a twinkle in Tim Berners-Lee’s eye and nobody could have predicted how fundamental a part of our lives it would become. The DNS was unwittingly limited in its design by the use of ASCII in its base specification, RFC 1035 (although there was vigorous discussion supporting non-ASCII characters, including in the “host name” convention discussions); while the DNS allows any binary string to be used as the label of a resource record, most operational DNS systems still only support the 26 letters of the Latin alphabet, the ten numerals and the hyphen. This is fine for most users in the Americas, Australasia, western Europe and parts of Africa. However, of the almost two billion Internet users believed to be online today, it is estimated that up to half of them have first languages that use non-Latin scripts.

After much discussion in the technical community over many years, it was decided that the best way to solve this problem without disrupting critical traffic was to keep the DNS infrastructure an ASCII-only environment and accommodate non-Latin scripts by encoding them into ASCII in end-point applications such as the browser. A protocol known as Punycode was developed as a way to represent non-ASCII characters into DNS-readable ASCII based characters. And now, browsers have evolved so that most current versions are Punycode capable and use Punycode to encode outgoing and decode incoming characters from whatever language the user types into the address bar.

IDNs have been available as second-level domains in many countries for many years, but it is only this year that ICANN has started to approve IDN top-level domains, so that domain names can be represented in local characters both to the left and right of the dot. Countries including China, Japan, Saudi Arabia, Russia, Jordan and Egypt have already had their choices of IDN country-code domain delegated in their respective national scripts. For example, http://وزارة-الأتصالات.مصر is the fully Arabic domain name assigned to Egypt’s Ministry of Communications and Information Technology. Because Arabic is written from right to left the top-level domain (مصر. or “.egypt”) comes first.

If you click the link above, you may find your browser displays this in the address bar: http://xn--4gbrim.xn—-ymcbaaajlc6dj7bxne2c.xn--wgbh1c – that’s the Punycode translation that the DNS uses to search the TLD’s zone file for the nameserver that will contain the IP address location of the website you are trying to reach. Punycode strings are immediately recognizable because they always start with the letters XN and two hyphens (xn–). For example, the domain name café.com, which has the acute accent above the letter E, would be represented in Punycode as xn--caf-dma.com. How applications choose to present these IDN domain names to their users depends on the application. All browsers in current versions are capable of recognizing IDN domain names in Web content and rendering them clickable by users.

The raison d’etre of the DNS is to allow simplicity and usability in Internet navigation and addressing. While the addition of IDNs will help billions of people achieve in this goal, it also adds a layer of complexity that carries some security risks.

From the outset, it was anticipated that IDNs could exacerbate the phishing threat. In ASCII, it’s already possible to confuse some letters and numbers – I (eye) and l (ell) and 1 (one), or O (oh) and 0 (zero) are obvious examples. Paypal.com looks a lot like Paypa1.com. But with the number of characters allowable in domain names increasing from 37 into potentially the thousands, the possibility of two strings being visually confusing increases considerably. While there’s little overlap between, for example, Chinese and Latin scripts, it is possible to “spell” certain English words using entirely Greek or Cyrillic characters (and vice versa), for example. These domains could look virtually identical to their English counterparts, but lead to entirely different and potentially malicious web sites.

Advertisement. Scroll to continue reading.

When these visually similar domains are used in phishing or other malicious activity, it’s known as an “IDN homograph attack”. While examples of possible attacks which exploit the confusion in characters either within a script or between scripts were discussed as early as 2001, one of the first proof-of-concept attacks was demonstrated in 2005, when a version of Paypal.com using the Cyrillic equivalent of the letter A was registered and used to direct users to a hacker-controlled website. More recently, this year Microsoft won a cybersquatting claim against the owner of bıng.com, a variation on bing.com that uses a Turkish ı in place of the usual Latin i.

internationalized domain names (IDNs)

While IDN homograph attacks are feasible using just one script to mimic another, the potential for phishing grows greater when mixed-script domain names are permitted. Some top-level domain registries, such as Russia’s IDN .рф (.ru), only permit registrations in a single script, in Russia’s case Cyrillic. But with other TLDs, script-mixing appears to be allowed. Many European languages (along with a few English words such as the aforementioned café), use accented Latin characters, which must be represented as IDNs even if the majority of the letters in the domain use vanilla ASCII.

Much as software makers are responsible for enabling IDNs for their users, it’s also incumbent on developers of applications such as browsers and email clients to ensure their implementations reduce the risk of phishing. Mozilla, for example, has a policy of enabling the visual display of IDNs for domains where the registry has published a policy on how it handles homographs. So, for example, café.biz and café.info both display normally, but naïve.com displays as xn--nave-6pa.com. Microsoft’s Internet Explorer takes the route of displaying Punycode strings unless the user’s local language settings correspond to the IDN.

These kinds of moves go some way to protect users of the legacy DNS from the possibility of losing their data or money to phishers, and should be noted by administrators and application developers alike. The multilingual DNS is going to be a powerful force for increasing Internet usage worldwide over the coming years and it should not be perceived simply as a security risk. With careful thought and the appropriate policies, user confusion among Internet users can be minimized, opening up the possibilities for the greater good that is to come by internationalizing the Internet.

(Updated 10/12/10)

Written By Ram Mohan

Latest News

Click to comment

CIEM Chat: How to Reduce Cloud Identity Risk

March 26, 2024

Join the session as we discuss the challenges and best practices for cybersecurity leaders managing cloud identities.

Virtual Event: Ransomware Resilience & Recovery Summit

April 17, 2024

SecurityWeek’s Ransomware Resilience and Recovery Summit helps businesses to plan, prepare, and recover from a ransomware incident.

Navigating Vendor Speak: A Security Practitioner’s Guide to Seeing Through the Jargon

As a security industry, we need to focus our energies on those professionals among us who know how to walk the walk. (Joshua Goldfarb)

SD-WAN: Don’t Build a Dead End, Prepare for Future-Proof Secure Networking

SD-WAN must be scalable, stable, secure, and fully operational to serve as a strong base for seamless modernization and progression to SASE. (Etay Maor)

You Against the World: The Offenders Dilemma

Foreign attackers have many more toolsets at their disposal, so we need to make sure we’re selective about our modeling, preparation and how we assess and fortify ourselves. (Tom Eston)

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

With automated, detailed, contextualized threat intelligence, organizations can better anticipate malicious activity and utilize intelligence to speed detection around proven attacks. (Marc Solomon)

Know Your Audience When Speaking to Security Practitioners

How can security practitioners make sense of the vendor landscape and separate those who talk a good game from those who can execute, perform, and solve real problems for enterprises? (Joshua Goldfarb)

Cybercrime

Comodo Forums Hacked via Recently Disclosed vBulletin Vulnerability

A recently disclosed vBulletin vulnerability, which had a zero-day status for roughly two days last week, was exploited in a hacker attack targeting the...

Eduard KovacsOctober 1, 2019