Internationalized Domain Names, heard of them? Double byte web addresses. You know the ones – the 日本.jps and the 価格.coms – you must have seen them lurking somewhere? Yes, these are called IDNs, or Internationalized Domain Names.
Just how valuable are these “snatch up while you can” bargains that the registrars have been trying to flog to us for the last 4 or 5 years? How will these prestigious pieces of “Internet Real Estate” rear up in the next decade? Are they a potential goldmine as Asians increasingly get net savvy, or another intricate money internet sucking scam? Stippy.com decided to take a deep look into the technology and history of domain names, and find out for ourselves about IDNs.
I was recently greeted by an email offer from a friend to help find a home for this “package set” of three “Hills Zoku” IDNs:
They are read hillszoku (.jp, .com, and .net) referring to the Roppongi Hills area in Tokyo. The seller estimates the value of this package to be around USD 50K (and is offering a commission of up to 15%! So, if you don’t bother reading our article, are rich, and are outlandish enough to buy these, let us know!)
But seriously, do you, or anyone you know ever use these on a regular basis (other than for the novelty check to see if they actually work)? Double byte domain names will not succeed. They will remain a novelty for native speakers of double byte languages, and us gaijin alike. Although this article may get a bit “techy”, and seem a bit long winded, I am going to explain why this is so, in terms that hopefully anyone should be able to understand. The first part of this in-depth look at IDNs may be a history lesson that bores those of you who already know the technical details of how the Internet works at its deepest and most basic level. But, it so happens that history is important when discussing this topic, as there are facts and figures that may sway your opinion of double byte character usage in internet domain names (this is not just a matter of subjective feelings). So, skip the rest of this article if you are not interested in the background – go and find yourself a stippy friend! On the other hand, if you read this article (and part 2), and still think I’m missing the entire point of IDNs then I’d love to hear your side of the argument in the comment section below!
(Update: part two of this article now published HERE)
The Internet, as most now know, is an enormous array of autonomous computer networks (or, as the Chairman of the United States Senate Committee on Commerce, Science and Transportation, Senator Ted Stevens sees it, a “series of tubes”). These computers – and other sorts of devices – which can be running almost any operating system, and may be served off hardware that was made up to 2 decades ago, are joined in a seemingly never-ending and dynamic tangle, of information.
Unlike in human interaction, where we get along with a somewhat flexible form of communication (different languages, dialects, facial expressions, tone of voice, body language) and where it doesn’t really matter if you make small grammatical mistakes in order to convey meaning, there are a few “rules” which all of the devices on the Internet must obey in order to communicate – no matter what age, creed, race or religion to which they belong (keeping with the human analogy). The sheer nature of tens of millions of different machines, all talking different languages, all running on different architectures, at different speeds, for a nearly unlimited set of applications, necessitates a very basic, but extremely stable protocol – or rulebook – in order to talk with one another in a reliable manner.
Much of the development of these “rules” for computers to talk with each other was done in English speaking countries (USA and UK), and hence all of the underlying messaging, addressing and command structures were, and still are in English. Sure, most of the technical details of these original protocols are totally hidden to the Internet end user, except one – the domain name, or domain address. That is, the stippy.com part of http://www.stippy.com. The domain address is still the easiest way for anyone in the world to connect to any other computer in the world, which is also connected to the Internet.
The predecessor of the Internet, which was called the “ARPANet”, was developed in the late 1960’s and, while still a far cry from the “series of tubes” we have now, used a communication system very similar in nature to that still used by modern day computers. Back then, each machine connected on the network was called a “node”. Each node of the original ARPANet was a computer called an IMP (Interface Message Processor), and what it did was hold a big text file full of computer numbers (IMP numbers) and mapped to “host names” (or simply, easier to remember “nicknames” for us stupid humans who can’t easily remember strings of numbers). To this day, the Internet does basically the same thing, with domain names.
Soon, when hundreds (and later on, thousands) of computers began connecting ARPANet. Manually storing all of the host names, and mapping them to their “node” numbers became cumbersome, and was fraught with human error, even though it was maintained in one central reference machine. A move away from the “text file” (flat host name table) approach to a more hierarchical method of storing the same mappings of computer names to numbers was necessary and very soon became inevitable.
In 1981, a proposal for a new system of Internet Name Domains – as we know them today – was drafted, and published as a Request for Comments (RFC 799). By 1983, it was decided that based on RFC 799, the “host table name system” – described above – would be replaced with a new system. And so a new era of computing was born. It was a method capable of storing and sharing millions of domain names mapped their corresponding numbers (IP addresses) hierarchically and efficiently between millions of host servers, which is still thriving today, called the Domain Name System, or DNS.
With the Internet what it is now, it is very easy to forget just why it was initially devised, and what purpose it served in its fledgling years. The ARPANet was primarily an email system, and in 1973, 75% of all packets transferred were emails, all between English native speakers. There was no requirement at the time that the network be able to transfer messages in any other language, let alone for it’s address command structure to work with non-English domain names. Hence, the whole Internet and email addresses architecture was (and still is) restricted to an extremely limited, but vastly efficient character set where only the letters of the English alphabet (case-insensitive), the decimal digits, and the hyphen are allowed. That is just 37 characters.
But, there is more than one language in the world (unfortunately!), and when the Internet began to be deployed throughout the globe in the early 1990s, some users and networking organizations in non-English speaking countries were sour that they could not use their native language script in Internet Domain Names. The Chinese and Japanese especially wanted to use Chinese characters (known of course as Kanji in Japan) in domain names. At the time, many of the heavyweight advocates of the proposal were from Japan, wishing to use their beloved kanji for Internet addressing. But, the Asian community in the 1990s were not involved in the initial architecture. Wishes were expressed without an understanding the basic issues – they were voting to build a nuclear reactor, without bothering to check if it was on a fault line. The problem – their character sets are double byte (one character requires two bytes of space (to store on a hard disk, or transmit), as opposed to Latin characters, which require only one).
Double byte characters certainly were not in the minds of the Internet “framers” when they were coming up with the messaging and command protocols that form the foundations of the “world wide web”. In fact, even now, double byte characters are renowned for 文字化け, (mojibake, garbling) even when sent as message content, and are shunned by developers and network administrators alike as useless and irritating overhead where basic Internet protocols are concerned.
The fact is, that the established rules and protocols of addressing on the Internet are stable, and they work – and they are made from rock solid Roman characters. The sheer scope of “the web”, the fact that it is an integral part of government, business, and private affairs today (at least in every developed country), dictate that changes to it’s basic standards and protocols, are virtually impossible and are to be avoided. At very least, changes to our internet foundations should not be considered for something so menial and “nice-to-have” as double-byte-capable Internet domains right? Wouldn’t internationalizing these domain names be like allowing Kanji phone numbers?
Well, it turns out though, that the dreams of the “yokomoji” (literally “sidewards characters”, a short-sighted way to refer to any language written in the Roman alphabet) adversaries came true, and what are now known as Internationalized Domain Names (IDNs), became a reality.. well, that is in a very superficial way. Thankfully, none of the secure foundations of Internet protocol were shaken by the implementation of IDNs. They are still not really domain names in the mind of any DNS servers (and most other internet servers for that matter), and in fact, are totally reliant on user applications (on your PC!) to convert them using a complicated set of rules into the 37 Roman characters and numbers that we know and love. The Japan Registry Service (JPRS), who took over from the Japan Network Information Center (JPNIC) is responsible for the “smooth administration of the Internet” in Japan, and ultimately the owner of what is called the “standardization” or “normalization” of IDNs in Japan. The JPRS, being private entity are primarily a commercially minded, and nowhere on their home page do they really tackle the struggle that IDNs are facing. Their predecessor, JPNIC (a government sponsored foundation) on the other hand, state the hurdles they faced (when they were in charge) in a very pragmatic and honest manner and take a responsible and analytical look at solutions (or, workarounds at best) to the technical issues. Here is a two points from their site, where they openly reveal their wish IDNs (multilingual domain names) must not change the underlying architecture of the Internet in any way:
- Multilingual domain names must not influence current DNS use and management
- Multilingual domain names should continue to be DNS that can resolve names in any type of system, regardless of the location
They go on the mention that ultimately, IDNs need to be converted back somehow into Roman characters in order to have any chance of becoming widely utilised by the average Internet user. (You can read about many more of the issues that surround Japanese IDNs in more detail on their homepage).
Also, Jim Breen, of Monash University in Australia, one of the most respected authorities when it comes to Japanese language dictionaries and encoding Japanese and Chinese characters for use on computer systems takes a deep look at the subject and the problems that IDNs face in one of his excellent papers. He explains why Japanese domain names would have a very hard time establishing themselves as mainstream.
Internationalised Domain Names introduce one more level of complexity (like a second language barrier!) to our Internet. Even apart from the fact that most of the Internet using world can not decipher them, the underpinnings of the web are in English, and they will never succeed on a large scale.
Anyway, that is enough for today’s domain name history lesson – come back in a few days for part two, where we shall explore from a user’s perspective just why the 日本語.com IDNs of the Internet are not going to play a significant role in your life. While you are waiting, leave comments on this topic below, or read this very good article (in Japanese only) on the subject, and find out why even Japanese people are complaining about IDNs!