[Air-l] Re: internet linguistic variety citations desired

Bram Dov Abramson bda at bazu.org
Wed May 7 18:20:37 PDT 2003


elijah wright (elw at stderr.org), 07-5-2003:

>that the amount of English used on the Internet doesn't
>(or will rapidly cease to) matter for the following
>reasons (or anything similar):
>
>(1) The Internet is global and distributed, so people will
>use their own languages on it, (2) The growth of
>non-English users exceeds in rate the growth of
>English-speaking users, (3) The technical affordances of
>the internet (Unicode, translation, etc.) have solved the
>language problems.
>
>These seem to be popular conceptions, for which there may
>be very slim published evidence.

Maybe you'd consider taking a more nuanced approach to "the Internet" -- a connectivity platform, after all -- by looking at how the posited single public space plays out differently across the different popular Internet applications.

The Web, for example, as not identical to e-mail.  Instant messaging, as not identical to online gaming.  And so forth.  Surely the formation of languaged (to mutilate English ;-)) social networks in each of these media has its own, as you say, affordances.

>we've chosen not to rely on Global Reach's statistics or
>reports because they have a vested interest in providing
>statistics that encourage people to... surprise... *BUY
>MORE STATISTICS* from them.
(...)
>global reach sells a lot of things.  ;)  one of their
>other big markets is translation services - they would
>love to help translate your corporate site into twenty
>languages, if you can afford it.

Whether it's selling more statistics or selling translation and localisation services: yes, it's certainly a good idea to try and understand the biases of one's sources, and to think about the direction in which those biases might lead said sources to err.

But I am skeptical of this idea of avoiding or discounting sources where biases can be identified, presumably in favour of better sources untainted by bias.  Most knowledge is produced by people with opinions, biases, etc.  The key is to evaluate in light of this, not search for that which is free of it; such a search might take a long while indeed.  

As for the methodology behind this particular source, there is nothing  sophisticated here.  This is path-of-least-resistance stuff: Internet user numbers are drawn from a variety of sources that used various methodologies on various dates, I suppose via NUA; all of the inconsistencies in this are then multiplied by what-language-they-speak assumptions which map official language(s) (save in a couple of cases, where available national demographics are substituted) onto Internet user population, introducing .  And so forth.

In other words a nice little project for someone -- on a part-time basis, I suppose -- which collates obtainable data into something meaningful and provides full documentation as to source data, assumptions, and methodology.  And which is neither capital-t Truth, nor pretends to be.

The sort of positivism which faults data sets for being imperfect or unmimetic representations of the world has always surprised me.  Necessarily, statistics (and maps, and pictures, and photographs, and ...) are trade-offs.  Finding out that they are trade-offs is not really the point imho; identifying what has been traded off, and using appropriately given those limitations, is.  Hence:

>how many of these internet users are multilingual with
>english and some other language readily accessible to
>them?  how many are multilingual with at least minimal
>competence in four or five of the top 100 languages
>spoken/written worldwide?  how are these people being
>counted?  these are the fun questions...]

Sure.  As long as these are understood, not as oversights which the compiler couldn't cotton onto -- the opposite seems more likely to me -- but as a degree of complexity outstripping the resources which the compiler was able to expend.  In which case the fun of asking these questions is not in the triumphal aha! found holes! but, rather, in thinking about how these change the questions we ask, and the answers we come up with.

(For example, why this is the only free data set available on the Web, indicating a distinct lack of interest from certain corners in reproducing or improving on it.  On which, I think I remember Angus Reid here in Canada doing something similar, but more of it in-house; their Web site likely has contact details and a press release or two.)

Interesting ideas, in all cases; good luck with it!

cheers
Bram




More information about the Air-L mailing list