[Air-L] Information wants to be ASCII or Unicode? Tibetan-written information cannot be ASCII anyway.

Han-Teng Liao (OII) han-teng.liao at oii.ox.ac.uk
Thu Jul 16 17:16:47 PDT 2009


First of all, I have to reframe the question in different way. Is the 
problem of ASCII or the problem of Unicode we are talking about?  On the 
one extreme we can argue that would it be nice that every domain names, 
hyperlinks and URL should stay in English alphabets (which enters the 
ICANN multilingual issue which I aim to avoid in this discussion), on 
the other extreme we can argue that there would be no problems if 
everyone is using Unicode now (which implies a coercive force to impose 
that without the usual technology diffusion).

I cannot speak for all those open source contributors out there.  I did 
not even try to find the regional and linguistic demographics of open 
source community.  Though I am a big fan of "good will" in Reagle's 
thesis, I cannot overlook the potentials of competitions and creative 
conflicts among all branches of open source projects. 

Then the question would be, who should make this efforts?  I will argue 
that the weight is overwhelmingly weighted on people who has to use 
Unicode.  In practice, it easily becomes a favor to be asked from those 
who need Unicode, and extra work to be done by the IT support.  Then 
Unicode the solution becomes a problem.  I am not saying there is no 
problem in Unicode implementation.  The reason why I raise the problem 
here in the AOIR mailing list, not in the Unicode mailing list is not to 
reaffirm the perception that adoption of Unicode could be difficult, but 
rather raise the relevant research issues around it.

Imagine Wikipedia project does not manage to implement the Unicode when 
it is hard.  Imagine Chinese Wikipedia does not manage to negotiate the 
simplified and traditional Chinese entry title and URL. Wikipedia will 
never be the same.  It is not a favor that we (who need Unicode support) 
ask.  We (internet researchers) need empirical research to see why and 
how the Unicode support is implemented in various projects.  It is not 
merely a issue that we should provide better support for programmers. 

Again, I am not arguing that the transition from non-Unicode to Unicode 
is easy and could be done overnight, and hence I have no intention to 
imply that it is all programmers' unwillingness and laziness to finish 
the mundane jobs.   It is the opposite.  If we lay out why, how much and 
how Wikipedia, youtube, Google and etc. invest in Unicode deployment 
(exploiting the open nature of Internet), we can better understand the 
richer dimensions of techno-linguistic polices.  It is not my intention 
to play blame game (the west versus east or the programmers versus 
users).   It is the opposite.  Why Baidu supports simplified Chinese 
versions of services, excluding Tibetans, Hong Kongese and even 
Taiwanese whom Beijing try to represent while Google and Youtube do much 
better jobs in creating a space where East Asians can fight with each 
other on the same page.  I hope this case shows my intention to make 
this an interesting research issue for mutli-discinplinary research than 
blaming any particular groups of people. 

I hope we are debating on "Information wants to be ASCII or Unicode" 
versus "Information wants to be digital", not "Information moving from 
ASCII to Unicode is difficult".  Then the issue would be clearer.  Who 
decides what digital standards should be selected and deployed.  What is 
the negotiation process.  And why?  Operating systems, global websites, 
regional websites, e-government services, citation databases etc are all 
the domains we should ask.


Mike Stanger wrote:
>
>> Just as a bit of evidence of how difficult it can be to grok 
>> character issues: Unicode is not "an encoding" itself, but a 
>> repertoire of characters, their names, and (abstract) code points 
>> (i.e., UCS), plus a set of encodings (i.e., UTF-8, UTF-16), extra 
>> properties, and algorithms. And I'm sure a Unicode geek could pick 
>> some wholes in what I've said!
>
> True enough :-)  Part of the problem in discussing Unicode (and other 
> things) is that one can speak to it at a 'standards' level or an 'in 
> practice' level at whatever level of practice the person encounters 
> Unicode.  By encoding I wasn't intending to imply that it was like 
> dealing with a codepage equivalent, but that there are assumptions 
> that are part of using Unicode that may not be visible to the people 
> using it.
>
> I'm thinking that the stated intent by a programmer, say in an open 
> source project, that the project is using unicode for the purposes of 
> being 'politically friendly' and interoperable would have the effect 
> of not only making the statement, but encouraging people to help guide 
> the programmer(s) in actually achieving that goal -- those who have a 
> deeper understanding of the issues informing those who are looking for 
> the practical goal of interoperability.
>
> Mike
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: 
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>


-- 
Han-Teng Liao
PhD Candidate
Oxford Internet Institute
http://www.oii.ox.ac.uk/people/students.cfm?id=123




More information about the Air-L mailing list