[Air-L] Information wants to be ASCII or Unicode? Tibetan-written information cannot be ASCII anyway.
Han-Teng Liao (OII)
han-teng.liao at oii.ox.ac.uk
Thu Jul 16 17:16:47 PDT 2009
First of all, I have to reframe the question in different way. Is the
problem of ASCII or the problem of Unicode we are talking about? On the
one extreme we can argue that would it be nice that every domain names,
hyperlinks and URL should stay in English alphabets (which enters the
ICANN multilingual issue which I aim to avoid in this discussion), on
the other extreme we can argue that there would be no problems if
everyone is using Unicode now (which implies a coercive force to impose
that without the usual technology diffusion).
I cannot speak for all those open source contributors out there. I did
not even try to find the regional and linguistic demographics of open
source community. Though I am a big fan of "good will" in Reagle's
thesis, I cannot overlook the potentials of competitions and creative
conflicts among all branches of open source projects.
Then the question would be, who should make this efforts? I will argue
that the weight is overwhelmingly weighted on people who has to use
Unicode. In practice, it easily becomes a favor to be asked from those
who need Unicode, and extra work to be done by the IT support. Then
Unicode the solution becomes a problem. I am not saying there is no
problem in Unicode implementation. The reason why I raise the problem
here in the AOIR mailing list, not in the Unicode mailing list is not to
reaffirm the perception that adoption of Unicode could be difficult, but
rather raise the relevant research issues around it.
Imagine Wikipedia project does not manage to implement the Unicode when
it is hard. Imagine Chinese Wikipedia does not manage to negotiate the
simplified and traditional Chinese entry title and URL. Wikipedia will
never be the same. It is not a favor that we (who need Unicode support)
ask. We (internet researchers) need empirical research to see why and
how the Unicode support is implemented in various projects. It is not
merely a issue that we should provide better support for programmers.
Again, I am not arguing that the transition from non-Unicode to Unicode
is easy and could be done overnight, and hence I have no intention to
imply that it is all programmers' unwillingness and laziness to finish
the mundane jobs. It is the opposite. If we lay out why, how much and
how Wikipedia, youtube, Google and etc. invest in Unicode deployment
(exploiting the open nature of Internet), we can better understand the
richer dimensions of techno-linguistic polices. It is not my intention
to play blame game (the west versus east or the programmers versus
users). It is the opposite. Why Baidu supports simplified Chinese
versions of services, excluding Tibetans, Hong Kongese and even
Taiwanese whom Beijing try to represent while Google and Youtube do much
better jobs in creating a space where East Asians can fight with each
other on the same page. I hope this case shows my intention to make
this an interesting research issue for mutli-discinplinary research than
blaming any particular groups of people.
I hope we are debating on "Information wants to be ASCII or Unicode"
versus "Information wants to be digital", not "Information moving from
ASCII to Unicode is difficult". Then the issue would be clearer. Who
decides what digital standards should be selected and deployed. What is
the negotiation process. And why? Operating systems, global websites,
regional websites, e-government services, citation databases etc are all
the domains we should ask.
Mike Stanger wrote:
>> Just as a bit of evidence of how difficult it can be to grok
>> character issues: Unicode is not "an encoding" itself, but a
>> repertoire of characters, their names, and (abstract) code points
>> (i.e., UCS), plus a set of encodings (i.e., UTF-8, UTF-16), extra
>> properties, and algorithms. And I'm sure a Unicode geek could pick
>> some wholes in what I've said!
> True enough :-) Part of the problem in discussing Unicode (and other
> things) is that one can speak to it at a 'standards' level or an 'in
> practice' level at whatever level of practice the person encounters
> Unicode. By encoding I wasn't intending to imply that it was like
> dealing with a codepage equivalent, but that there are assumptions
> that are part of using Unicode that may not be visible to the people
> using it.
> I'm thinking that the stated intent by a programmer, say in an open
> source project, that the project is using unicode for the purposes of
> being 'politically friendly' and interoperable would have the effect
> of not only making the statement, but encouraging people to help guide
> the programmer(s) in actually achieving that goal -- those who have a
> deeper understanding of the issues informing those who are looking for
> the practical goal of interoperability.
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> Join the Association of Internet Researchers:
Oxford Internet Institute
More information about the Air-L