Non Latin-language websites

Posted on 2:58 PM by 1001 Webs

1001 Webs provides translation and localization of websites, which involves the translation of both visible content and HTML and XML code (Title Tags, Meta Tags, Alt Tags, etc.) and the adaptation of all the scripting components (Databases, Active Server Pages, Javascript, PHP, Perl, etc.).



And now 1001webs is going truly Global with the incorporation of Hindi, Chinese, Japanese, Russian and Arabic versions of our website.

We have come across this excellent article at the Technology section of The Guardian Unlimited:
http://www.guardian.co.uk/technology/2006/jul/27/guardianweeklytechnologysection5

Below are some excerpts:

Read this for starters:
"Despite everything you may have heard, the global resource we all know as the internet is not global at all. Since you are reading this article in English you probably won't have noticed, but if your first language was Chinese, Arabic, Hindi or Tamil, you would know very different."
Ever wondered about ASCII codes and implementation in non-Latin languages:
"the term ASCII itself. It stands for American Standard Code for Information Interchange and it is the code devised to enable computers to represent and process all the characters in the English alphabet (a through to z, plus 0 to 9 and the various symbols you get on your keyboard such as % and &).
It was first developed in 1967 and written into the internet's foundations by American scientists. It is now so hardwired into the net that the only way to include other characters such as accents on letters, or Chinese or Arabic script, is to use complex combinations of letters that don't exist in English words in order to represent them.
Linguists have created long tables to represent all the possible combinations and permutations of different languages. In the case of internet domain names, the address is preceded by "xn--" and then an agreed code. For example "www.rémax.com" is represented as "www.xn--rmax-bpa.com". Using this method, it suddenly becomes possible to have internet domain names containing foreign characters, and hence foreign language domain names."
But:
"From the western perspective this approach was sufficient for the rest of the world to use the internet. But the problem is that each of these domains still has to use the existing domain system with ".com" or ".net" - suffixes that are virtually incomprehensible to non Latin- derived language users."
and the conclusion is:
"... with non- Latin-language networks becoming increasingly advanced, China making it clear it is prepared to break away from the internet, MINC touting a solution that could bypass its processes altogether and, perhaps most crucially, Microsoft deciding to include IDN10 technology in the new version of Internet Explorer, out later this year, Icann has been left with no choice but to speed up the technical side of internationalised domain names in a bid to keep the net together."
We strongly recommend all of our International partners, specially those managing the Hindi, Chinese, Japanese, Russian and Arabic versions of 1001webs, to read the full article at the Technology section of The Guardian Unlimited:
http://www.guardian.co.uk/technology/2006/jul/27/guardianweeklytechnologysection5, so they get a clearer idea of the difficulties of working with those languages.

Some Links of Interest:

ICANN
ICANN is responsible for the global coordination of the Internet's system of unique identifiers. These include domain names (like .org, .museum and country codes like .uk, .fr, .pt, .de, .es, .jp, .cn, .etc), as well as the addresses used in a variety of Internet protocols. Computers use these identifiers to reach each other over the Internet. Careful management of these resources is vital to the Internet's operation.

American Standard Code for Information Interchange (ASCII)
ASCII codes represent text in computers, communications equipment, and other devices that work with text. Most modern character encodings — which support many more characters than did the original — have a historical basis in ASCII.

China gives itself its own top-level domains
China has decided to bypass ICANN altogether and set up its own set of TLDs and domain name servers. In addition to the .cn TLD, China will have three new Chinese-character TLDs equating to "dot China," "dot com," and "dot net."

See also


ASCII codes table - Format of standard characters


ASCII

Hex

Symbol

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F

NUL
SOH
STX
ETX
EOT
ENQ
ACK
BEL
BS
TAB
LF
VT
FF
CR
SO
SI


ASCII

Hex

Symbol

16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

10
11
12
13
14
15
16
17
18
19
1A
1B
1C
1D
1E
1F

DLE
DC1
DC2
DC3
DC4
NAK
SYN
ETB
CAN
EM
SUB
ESC
FS
GS
RS
US


ASCII

Hex

Symbol

32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

20
21
22
23
24
25
26
27
28
29
2A
2B
2C
2D
2E
2F

(space)
!
"
#
$
%
&
'
(
)
*
+
,
-
.
/


ASCII

Hex

Symbol

48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

30
31
32
33
34
35
36
37
38
39
3A
3B
3C
3D
3E
3F

0
1
2
3
4
5
6
7
8
9
:
;
< = >
?


ASCII

Hex

Symbol

64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79

40
41
42
43
44
45
46
47
48
49
4A
4B
4C
4D
4E
4F

@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O


ASCII

Hex

Symbol

80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95

50
51
52
53
54
55
56
57
58
59
5A
5B
5C
5D
5E
5F

P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
^
_


ASCII

Hex

Symbol

96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111

60
61
62
63
64
65
66
67
68
69
6A
6B
6C
6D
6E
6F

`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o


ASCII

Hex

Symbol

112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127

70
71
72
73
74
75
76
77
78
79
7A
7B
7C
7D
7E
7F

p
q
r
s
t
u
v
w
x
y
z
{
|
}
~



ASCII Codes | HTML Codes | Conversion | References | Control Characters


HTML Codes - Characters and symbols
Standard ASCII set, HTML Entity names, ISO 10646, ISO 8879, ISO 8859-1 Latin alphabet No. 1 Browser support: All browsers
ASCII HTML HTML
Dec Hex Symbol Number Name Description

32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F ! " # $ % & ' ( ) * + , - . / ! " # $ % & ' ( ) * + , - . / " & space exclamation point double quotes number sign dollar sign percent sign ampersand single quote opening parenthesis closing parenthesis asterisk plus sign comma minus sign - hyphen period slash
ASCII HTML HTML
Dec Hex Symbol Number Name Description

48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 0 1 2 3 4 5 6 7 8 9 : ; < = > ? < > zero one two three four five six seven eight nine colon semicolon less than sign equal sign greater than sign question mark
ASCII HTML HTML
Dec Hex Symbol Number Name Description

64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F @ A B C D E F G H I J K L M N O @ A B C D E F G H I J K L M N O at symbol
ASCII HTML HTML
Dec Hex Symbol Number Name Description

80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F P Q R S T U V W X Y Z [ \ ] ^ _ P Q R S T U V W X Y Z [ \ ] ^ _ opening bracket backslash closing bracket caret - circumflex underscore
ASCII HTML HTML
Dec Hex Symbol Number Name Description

96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F ` a b c d e f g h i j k l m n o ` a b c d e f g h i j k l m n o grave accent
ASCII HTML HTML
Dec Hex Symbol Number Name Description

112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F p q r s t u v w x y z { | } ~ p q r s t u v w x y z { | } ~ opening brace vertical bar closing brace equivalency sign - tilde (not defined in HTML 4 standard)
ASCII HTML HTML
Dec Hex Symbol Number Name Description

128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard)
ASCII HTML HTML
Dec Hex Symbol Number Name Description

144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard) (not defined in HTML 4 standard)
ASCII HTML HTML
Dec Hex Symbol Number Name Description

160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯ ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯ ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯ non-breaking space inverted exclamation mark cent sign pound sign currency sign yen sign broken vertical bar section sign spacing diaeresis - umlaut copyright sign feminine ordinal indicator left double angle quotes not sign soft hyphen registered trade mark sign spacing macron - overline
ASCII HTML HTML
Dec Hex Symbol Number Name Description

176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ degree sign plus-or-minus sign superscript two - squared superscript three - cubed acute accent - spacing acute micro sign pilcrow sign - paragraph sign middle dot - Georgian comma spacing cedilla superscript one masculine ordinal indicator right double angle quotes fraction one quarter fraction one half fraction three quarters inverted question mark
ASCII HTML HTML
Dec Hex Symbol Number Name Description

192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï latin capital letter A with grave latin capital letter A with acute latin capital letter A with circumflex latin capital letter A with tilde latin capital letter A with diaeresis latin capital letter A with ring above latin capital letter AE latin capital letter C with cedilla latin capital letter E with grave latin capital letter E with acute latin capital letter E with circumflex latin capital letter E with diaeresis latin capital letter I with grave latin capital letter I with acute latin capital letter I with circumflex latin capital letter I with diaeresis
ASCII HTML HTML
Dec Hex Symbol Number Name Description

208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß latin capital letter ETH latin capital letter N with tilde latin capital letter O with grave latin capital letter O with acute latin capital letter O with circumflex latin capital letter O with tilde latin capital letter O with diaeresis multiplication sign latin capital letter O with slash latin capital letter U with grave latin capital letter U with acute latin capital letter U with circumflex latin capital letter U with diaeresis latin capital letter Y with acute latin capital letter THORN latin small letter sharp s - ess-zed
ASCII HTML HTML
Dec Hex Symbol Number Name Description

224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF à á â ã ä å æ ç è é ê ë ì í î ï à á â ã ä å æ ç è é ê ë ì í î ï à á â ã ä å æ ç è é ê ë ì í î ï latin small letter a with grave latin small letter a with acute latin small letter a with circumflex latin small letter a with tilde latin small letter a with diaeresis latin small letter a with ring above latin small letter ae latin small letter c with cedilla latin small letter e with grave latin small letter e with acute latin small letter e with circumflex latin small letter e with diaeresis latin small letter i with grave latin small letter i with acute latin small letter i with circumflex latin small letter i with diaeresis
ASCII HTML HTML
Dec Hex Symbol Number Name Description

240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ latin small letter eth latin small letter n with tilde latin small letter o with grave latin small letter o with acute latin small letter o with circumflex latin small letter o with tilde latin small letter o with diaeresis division sign latin small letter o with slash latin small letter u with grave latin small letter u with acute latin small letter u with circumflex latin small letter u with diaeresis latin small letter y with acute latin small letter thorn latin small letter y with diaeresis


HTML 4.01, ISO 10646, ISO 8879, Latin extended A and B,
Browser support: Internet Explorer > 4, Netscape > 4


HTML
HTML

Dec
Hex
Symbol
Number
Name
Description

338
339
352
353
376
402
152
153
160
161
178
192
Œ
œ
Š
š
Ÿ
ƒ
Œ
œ
Š
š
Ÿ
ƒ






latin capital letter OE
latin small letter oe
latin capital letter S with caron
latin small letter s with caron
latin capital letter Y with diaeresis
latin small f with hook - function


HTML
HTML

Dec
Hex
Symbol
Number
Name
Description

8211
8212
8216
8217
8218
8220
8221
8222
8224
8225
8226
8230
8240
8364
8482
2013
2014
2018
2019
201A
201C
201D
201E
2020
2021
2022
2026
2030
20AC
2122













































en dash
em dash
left single quotation mark
right single quotation mark
single low-9 quotation mark
left double quotation mark
right double quotation mark
double low-9 quotation mark
dagger
double dagger
bullet
horizontal ellipsis
per thousand sign
euro sign
trade mark sign

ASCII Codes | HTML Codes | Conversion | References | Control Characters


More Info:
http://ascii.cl


Links to ASCII and HTML Standards documents:

HTML 4.01 - Specification

HTML 4.01 - Character set

HTML 4.01 - Character entities

HTML 4.01 - References

7-Bit American National Standard Code for Information Interchange (7-Bit ASCII) - ANSI Document X3.4-1986 (R1997)
Specification for standard set of 128 characters, ANSI Approval Date: 12/23/1997

Unicode Code Charts

Mathematical Markup Language 1.01 Specification

RFC822: Standard for ARPA Internet Text Messages
David H. Crocker, August 13, 1982

ASCII format for Network Interchange
Vint Cerf, October 16, 1969

A tutorial on character code issues
Jukka Korpela, updated.


Links to Standards Organizations

World Wide Web Consortium (W3C)

American Nacional Standards Institute (ANSI)

International Organization for Standardization (ISO)

Unicode NSSN - Global Standards Organization

National Institute of Standards and Technology

World Standards Services Network

No Response to "Non Latin-language websites"