THE Internet Corporation for Assigned Names and Numbers’ (ICANN’s) meetings in Kuala Lumpur last week were the first ever to incorporate discussions on Internationalised Domain Names. Friendly contention on related issues were apparent between parties with different ideas on how to approach the issue.
Since its inception about 30 years ago, the Internet has primarily been based on the Latin (or Roman) alphabet, Arabic numerals and punctuation marks, all encoded according to the 8-bit American Standard Code of Information Interchange (Ascii) which serves well in the English-literate world or in countries like Malaysia where the national language is written in the Latin script.
The term “Internationalised Domain Names” (IDNs) refers to domain names partially or totally written in non-Latin script like Urdu, Arabic, Tamil or Chinese using unique character codes rather than Ascii.
The contention at ICANN revolved around technical concerns about the Internet’s core infrastructure, especially about its root servers being able to handle these new domain names and scripts, thus compromising the stability of the Internet’s infrastructure and seamless exchange of information across it.
Other concerns involved not having enough code tables or codes to accurately represent all available letters, characters and accents in different scripts and languages; problems like having five different Chinese characters with similar sounds and meanings which make it difficult to register domain names in Chinese due to confusion; or the confusion caused by the Tamil script which uses the exact same symbol for the number 1 and character “ka.”
This problem exists even in Ascii where the numeral “5” and upper case letter “O” can be used together to write “5O” which readers will see as the number “50” – substituting an upper case “O” for a zero in a web or e-mail address will result in an error.
While the 16-bit Unicode from the industry consortium Unicode Inc (www.unicode.org) enables encoding of up to 65,535 different script symbols, people complain they’re still not enough and that the administration is too rigid to adjust to different script and language group requirements.
Norbert Klein, an advisor to the Open Forum of Cambodia (www.forum.org.kh), complained that without alerting the Cambodian authorities to seek their participation, the Unicode Consortium went ahead and developed its own Khmer script code table with some characters which shouldn’t have been there. In addition, the consortium left out a number of characters and symbols, thus rendering the script unusable.
The Open Forum is a Cambodian non-governmental organisation committed to the advancement of technical understanding, information distribution, development and policy issues in Cambodia.
“I was part of the negotiation team, along with Cambodian officials, which brought this matter up with Unicode and the ISO (International Standards Organisation) but Unicode said whatever was published remains, even if it’s wrong. They said the best they could do was to add footnotes to the code table telling people not to use it. Fortunately, Unicode was prepared to accept over 20 letters it had left out,” said Klein.
The Cambodian authorities got round the problem in 2002 by localising Microsoft software to support Khmer script.
“However, buying a Windows-based system with the Microsoft Office suite is simply beyond the reach of most Cambodians with it costing the equivalent of several years of a Cambodian teacher’s US$35 (RM133) monthly salary,” said Klein.
His NGO initiated a three-year project to develop open source, Unicode-based software which they expect will meet 80% of the needs of all users.
“We hope no other language group will suffer the same fate the Khmer script did and we hope there will be international sensitivity to assist language standardisation,” said Klein.
“If we don’t multilingualise the Internet, non-English speakers or about 80% of the world’s population will continue to be excluded from using the so-called ‘global Internet,’ “ Multilingual Internet Names Consortium (MINC, www.minc.org) chairman and chief executive officer Khaled Fattal told the event’s IDN workshop.
“The existing Internet is a series of language-based Internets with the English language or the Ascii code being dominant, and this is what we call the ‘global Internet’ today.”
Two options for creating a truly global Internet are to either teach English to the 4.5 billion non-English speakers worldwide, or to “multilingualise” the Internet by fully incorporating the languages of non-English speakers into the Internet infrastructure. This can be done by getting local experts and users to participate in creating their own language tables for use on the Internet.
“Our ultimate vision for the Internet is for people to write their message in one language and have it automatically translated into the recipient’s language for him to read,” said Khaled.
MINC was founded in Singapore in 1988 based on a bottom-up approach of encouraging communities to get involved in developing their own language group scripts, which will be placed in ICANN’s root servers around the world.
“In 2000, we helped develop language sets in Chinese, Tamil, Urdu, Arabic and other languages in 20 countries,” said i-DNS.net International Inc chairman S. Subbiah, a Singaporean who co-founded MINC in 1998 with National University of Singapore professor Tan Wee Tin.
MINC’s philosophy is based on the “Multilingual dot Multilingual” (or “ML.ML”) concept where the whole web address is written in the same script throughout, whether from left to right or right to left, instead of mixing non-Latin and Latin scripts.
In such a case, the Latin script may not be understood by the user, and this might pose practical problems, with dual Latin/non-Latin keyboards being required.
However, ICANN chairman Dr Vint Cerf and Internet Engineering Task Force liaison to the Board Dr John C. Klensin are concerned about the technical implications, especially of non-standard characters being used in the root server; and the possibility of right-to-left written characters confusing the Internet infrastructure at its core.
“The Internet’s underlying protocols depend on Ascii, and now there are lots of applications using Unicode that can be presented in XML (eXtensible Markup Language) and HTML (Hypertext Markup Language) which can support multiple language groups and scripts,” Cerf told In.Tech.
“If the Internet were a wheel, its current development has reached the level of an ox cart. If people want it to support the things they say they want it to do, the Internet will have to be a rocket ship, so the community should work together to provide it with a rocket engine in terms of the underlying technology within its infrastructure,” he added.
MINC’s Khaled believes Cerf’s and Klensin’s technical concerns can be addressed by extensive testing of the scripts for interoperability to ensure they pose no problems before they’re placed in the root server.
Both Klensin and Cerf are also very concerned that MINC’s ML.ML approach to multilingual IDNs will result in the Internet fragmenting into islands of user groups communicating in their own language, thus defeating the global spirit of the Internet.
“The easy answers for internationalisation are really good if you’ve got an isolated, homogeneous population which knows by talking with each other that they’re all speaking the same language, using the same scripts and the same codings. That’s a very simple problem,” Klensin told the IDN workshop.
“The ability to make that work does not imply a solution to the internationalisation problem, because the easy way of making that work is to let those people communicate with each other, while they don’t communicate with anyone else and nobody else communicates with them.
“All the global solutions involve policy tradeoffs in which those two sets of issues are balanced against each other in an intelligent way and while I don’t have the answers, we may start working on them this afternoon,” he added.
Despite that, Cerf summed up the day by saying: “As the community works towards the introduction of IDNs, there will be much technical and policy work to be done to make this valuable extension of the Domain Name system useful.”