Although security and networking professionals aren’t officially fortune tellers, being able to predict issues -- and then alleviating them before they happen -- is a large part of the job. One particular issue is the ongoing confusion that users have in understanding the Internet’s Domain Name System (DNS) and its translation of Internet Protocol address numbers into easy-to-remember names. For example, IP address “188.8.131.52” is better known and more easily remembered as SecurityWeek.com.
But now that the Internet Corporation for Assigned Names and Numbers (ICANN) is preparing to release hundreds of new generic Top-Level Domains (gTLDs) – starting most likely in late 2012 -- some concerns are being raised that users will be confused about domains beyond .com, .org, .info and other favorites. Between growing familiarity with the new TLD idea (and ongoing use of search engines), I expect the confusion will resolve itself.
On the technical side, however, one of the worries is deciphering what the new gTLD program could mean for the security and integrity of the Internet’s naming and address allocation systems. As a member of ICANN’s Security and Stability Advisory Committee (SSAC), I contribute to the conversations that shape how the Internet functions today and, more importantly, how it will likely function in the future.
Recently, the SSAC wrote a report (pdf) outlining its concerns with the potential adoption of multilingual top-level domains that consist of just a single character; in technical jargon, these are called Single Character Internationalized Domain Names (IDNs). Imagine, instead of “.pepsi” or “.golf”, just a single character, representing a whole word, but written in a script that is often not ASCII – such as Arabic, Chinese, or Tamil.
Single Character Names: Help or Harm?
Single Character names are not simply a “nice to have” feature, but a necessary requirement for some multilingual communities. For example, in Chinese, many complete words are simply represented by a single Chinese character. Community members say, and reasonably so, that if a string like “.word” can be allowed into the top-level domain root, then there is no earthly reason why the equivalent one-character string which means the same in Chinese cannot be accepted.
Similarly, in Hindi, with more than 400 million speakers worldwide, the word for “Yes” is a single character. Practitioners argue that there is no reason that this word would be confused with some other language – and just as it would be okay to have a “.YES” top-level domain, a similar one-character Hindi equivalent should also be allowed as a top-level domain. A case can be made that not allowing such words to be made available online creates inequity for the world’s non-English speaking communities, more of whom are increasingly online.
However, managing the core of the DNS and the root requires severe conservatism. Once a top-level domain is added to the root, it remains there essentially forever and is expected to work in a reasonable and unambiguous manner. And ambiguity with how computer systems and people deal with single character multilingual names is at the core of the argument for why such names should not yet be universally deployed.
To understand the complexity, let’s start with the term “single character.” In some cases, a (potential) top-level domain that appears as a single character may actually be represented by a multiple ASCII characters. For example, both the Chinese and the Hindi names referred to above represent a single concept and resemble a single character in their native language. In computer terms, however, these characters may be composed of several glyphs (or “strokes”) that combine into a single visual character. So while it’s visually a single “letter,” the DNS doesn’t “read” it that way.
Generally, the more characters there are in a top-level domain, the less likely it is that a user will find it confusing. For example, while everyone may not understand that domains ending in “.com” belong to commercial entities while those ending in “.org” tend to belong to non-for-profit organizations, it’s difficult to mistake the characters themselves. On the other hand, domains ending in “.com” and those ending in “.com” or even “.co” can often be easily mistaken.
Logically, therefore, in a single character multilingual name, there exists no context to help users identify the language or the string intended. For example, the letter “a” reads identically in English, Greek or Russian; to a computer, these are three entirely different characters! This means that “.bank” in English could quite easily be masqueraded by using the Greek or the Russian “a”. In fact, this is often a ploy used by phishers and scammers today in the second level domain area.
So, while you don’t have to be a fortune teller to see a coming tide of multilingual domain names at the top-level of the DNS, such a tide has the potential to confuse, maim and mutilate the way users expect top-level domains to work today. Many organizations are making strong efforts to understand this topic further, and to research methods to allow single character internationalized top-level domains without causing serious user confusion. It’s possible that an exception-based approach might be a reasonable option, so that fully worked-out cases are trialed before venturing into the uncharted territory of single character internationalized top-level domains.