[Air-l] counting google hits

Thomas Koenig T.Koenig at lboro.ac.uk
Wed Mar 2 17:03:09 PST 2005


Citeren elijah wright <elw at stderr.org>:

>
> > 1. Go to www.google.com
> > 2. type in Carribean
> > 3. Look at the light blue web bar on top of the first list of hits. It
> > will show you the approximate number of hits:
> > Example: "Results 1 - 10 of about 38,100,000 for caribbean"
>
> folks realize that using the "number of hits returned on google" is a
> hilarious bad way to prove a point -- right?

Wrong. What's wrong with using the vast internet resources as a quasi-corpus
for natural languages (if you avoid certain pitfalls, which I alluded to in
my last message)?

Corpora such as WordNet (http://wordnet.princeton.edu/) or Wortschatz
(http://wortschatz.uni-leipzig.de/) are also far from being perfect (aka
totally unbiased).

> this is like reading a student paper that says: "Merriam Webster's
> dictionary says that X is defined as Y.  Therefore, Z.", accompanied by
> no
> further argumentation.  Possibly true, but pretty hole-y logic.

I am afraid, this is how your argumentation sounds to me. Why should it be
wrong to use the number of google hits under all circumstances?

If I want to show that Canada is better known than Vanuatu
(http://googlefight.com/index.php?lang=en_GB&word1=canada&word2=vanuatu),
why would the comparison of google hits be inadmissable? (There are a
number of reasons, why the "Vunuatu" hits are inflated, but that is of no
concern here).

Thomas
--
thomas koenig, ph.d.
department of social sciences, loughborough university, u.k.
http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html



More information about the Air-L mailing list