[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

foreign languages



On Fri, 6 Dec 1996, Simon Higgs wrote:

> > be created/reserved. The Internet is global, but national/regional
> > interests do conflict at times and there are many languages: being truly
> > global requires taking this aspect into consideration.
> 
> It's more a question of supportable character sets in DNS. How about
> Russian, Chinese or Japanese TLDs?

I had an idea that would deal with this that I've been mulling over for a
while. The following very rough document was my first attempt to
systematize the idea. You'll note the intention to encompass UNICODE
however the rough draft below still falls short in number of code points
although it might handle everything up to but not including chinese.



Network Working Group                                     Michael Dillon
Request for Comments: ####                           Memra Software Inc.
                                                        28 November 1996


                       Multilingual Domain Names

Status of this Memo

   This memo specifies an Internet standard for multilingual domain
   names using any character set which can be represented in Unicode.
   Distribution of this memo is unlimited.


Overview and Rationale

   The current system of domain names restricts the names to the digits
   0 through 9, the 26 letters of the English alphabet and the dash (-).
   The letters are not case sensitive thus Memra.COM is equivalent to
   memra.com. This system poses problems for people using other lan-
   guages with character sets that cannot be mapped directly to the
   English alphabet. While an approximate mapping can often be achieved
   for many languages which use a Latin based alphabet by using non-
   accented versions of accented characters, this often results in con-
   fusing representations of words which are normally distinguished only
   by different accents.

   Thus, this proposal suggests a way in which any language with codes
   defined in the Unicode character set can be used in domain names
   without any negative empact on the existing system.

Distinguishing New from Old

   In order to supply the information necessary to distinguish other
   character sets, we must conform to the existing standards for domain
   naming while introducing an escape sequence that notifies newer soft-
   ware that the domain name is actually an encoded form. It is proposed
   that the dash serve this purpose when it occurs as the first charac-
   ter of a domain name segment.

   A fully qualified domain name such as www.memra.com consists of more
   than one segment seperated by dots. Currently the dash is only used
   rarely and never occurs at the beginning of a segment. This proposal
   would reserve the dash at the beginning of a segment to signify that
   the remaining characters in the segment constitute an encoded form of
   Unicode. The characters following the segment will be interpreted as
   digits for a base 36 number which represents the code position of the
   Unicode characters in a table derived from Unicode.



Dillon                                                          [Page 1]





RFC ####                Multilingual Domain Names       28 November 1996


   The base 36 number is interpreted in groupings of one, two or three
   digits from left to right as follows:

       A-Z - single digit code  (1 through 26)
       1-8 - double digit code  (27 through 314)
       9   - three digit code  (315 through 1610)

   The base 36 digits are as follows:

   0 - 0
   A-Z - 1-26
   1-9 - 27-36

   The table of characters refernced by the numbers is drawn from the
   Unicode by removing lower case characters and other instances where
   two glyphs represent the same symbol in an unambiguous way. The first
   twenty-six positions in this table will contain the letters A-Z so
   that an encoded containing Latin letters can be more easily recog-
   nized by people. Under this encoding scheme the domain name
   -memra.-com would be equivalent to memra.com however unless a .-com
   domain is officially created by IANA this name could not be used on
   the global Internet. Top level domains beginning with a dash are more
   likely to be created to represent Cyrillic, or Japanese names.

Examples

   Here are some examples of domains using the new system.

   The French word for "where" is represented as the letter O followed
   by U with a grave accent. According to our system this would be
   encoded as the two numbers 0015 0049. But using our base 36 encoding
   scheme we get O as the single digit representation for 0015 and 1W
   for 0049 thus we could represent the french translation of where.fr
   as -o1w.fr

Table of Characters

   This table is not yet worked out other than the first 26 positions.
   A sample table is included here to illustrate some sample domain
   names.

   0001 A
   0002 B
   0003 C
   0004 D
   0005 E
   0006 F
   0007 G



Dillon                                                          [Page 2]





RFC ####                Multilingual Domain Names       28 November 1996


   0008 H
   0009 I
   0010 J
   0011 K
   0012 L
   0013 M
   0014 N
   0015 O
   0016 P
   0017 Q
   0018 R
   0019 S
   0020 T
   0021 U
   0022 V
   0023 W
   0024 X
   0025 Y
   0026 Z
   0027 0
   0028 1
   0029 2
   0030 3
   0031 4
   0032 5
   0033 6
   0034 7
   0035 8
   0036 9
   0037 Eacute
   0038 Egrave
   0039 Ecirc
   0040 Aacute
   0041 Agrave
   0042 Auml
   0043 Icirc
   0044 Iuml
   0045 Ouml
   0046 Ocirc
   0047 Uuml
   0048 Uacute
   0049 Ugrave

Security Considerations

   This RFC raises no security issues.

Author's Address



Dillon                                                          [Page 3]





RFC ####                Multilingual Domain Names       28 November 1996


   Michael Dillon
   Memra Software Inc.
   C-4 Powerhouse, RR #2
   Armstrong, BC  V0E 1B0
   CANADA

   Phone: +1-250-546-8022
     Fax: +1-250-546-3049
   EMail: michael@memra.com










































Dillon                                                          [Page 4]




Michael Dillon                   -               Internet & ISP Consulting
Memra Software Inc.              -                  Fax: +1-604-546-3049
http://www.memra.com             -               E-mail: michael@memra.com