[Air-l] counting google hits

Greg Elmer gelmer at ryerson.ca
Thu Mar 3 06:25:14 PST 2005


Thomas,

Of course all samples have biases, but isn't it incumbent upon us to better understand them -- particularly in the case of Google, precisely because it is fast becoming "all that we have", that is a default choice for information retrieval, not only for everyday users, but also for students and Internet researchers?

Greg Elmer




----- Original Message -----
From: Thomas Koenig <T.Koenig at lboro.ac.uk>
Date: Wednesday, March 2, 2005 8:54 pm
Subject: Re: [Air-l] counting google hits

> Citeren elijah wright <elw at stderr.org>:
> 
> [What's wrong with using Google stats?]
> > because people assume that all texts that are available are 
> represented,> which according to the google people they are *not*.
> 
> Fair enough, but what is your alternative corpus? Most traditional 
> corporahave a bias away from everyday language to journalistic 
> and/or literary
> writings. Sometimes these bias' may not matter, some other times, they
> might be even desirable, but at times google is the better choice, 
> even if
> imperfect.
> 
> > in other words, the sample that you are pulling numbers from is 
> neither> complete nor perfect - so your results won't be either.
> 
> Who gets unbiased random samples? No-one, not even NORC, who are 
> pretty good
> at it. Does that invalidate *all* statistical results? Of course 
> not. Don't
> get me wrong, I am all for careful random sampling, but if I cannot 
> get it,
> I might, under some circumstances, resort to biased samples, rather 
> than to
> not get any sample at all.
> 
> > do you understand what google does well enough (details of the 
> algorithm,> et cetera) to know what the weaknesses are?  oh, you 
> say they haven't
> > published enough information for you to know?  that's what i 
> thought.  :|
> 
> I do not know, how google indexes (I have a faint idea, though), 
> but for
> many practical purposes, it simply does not matter, as long as I do 
> notsuspect a bias of exclusions of websites, which are *systematically
> related* to the topic I am researching.
> 
> Would I rather have a random sample of all human-generated websits,
> preferably with the vital stats of their authors attached? You bet. 
> I just
> won't get it. So I am taking the next best thing, aka Google.
> 
> > > I am afraid, this is how your argumentation sounds to me. Why 
> should it
> > > be wrong to use the number of google hits under all circumstances?
> >
> > i think your tone is pretty crass.
> 
> Funny, that's what I thought of yours, that's why I chose to use 
> *your*words. You probably know that it's sometimes difficult to 
> discern the tone
> when you have no cues other then some ASCII strings.
> 
> > > If I want to show that Canada is better known than Vanuatu
> > >
> > 
> (http://googlefight.com/index.php?lang=en_GB&word1=canada&word2=vanuatu),> > why would the comparison of google hits be inadmissable? (There are a
> > > number of reasons, why the "Vunuatu" hits are inflated, but 
> that is of
> > > no concern here).
> >
> > popularity of a term is one of the few instances in which 
> comparative> occurrence vis a vis the google corpus *might* be 
> useful.  it would
> > depend
> > on your question, and whether the data available from the particular
> > google server you're connected to is appropriate to answering it.
> 
> Of course, it always depends on what you want to do, but that's a far
> stretch of your wholesale rejection of using Google hits for any 
> kind of
> research:
> 
> "folks realize that using the "number of hits returned on google" 
> is a
> hilarious bad way to prove a point -- right?"
> 
> Thomas
> 
> --
> thomas koenig, ph.d.
> department of social sciences, loughborough university, u.k.
> http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html




Greg Elmer, PhD
Bell Globemedia Research Chair
Rogers Communications Centre/School of Radio-TV Arts 
Ryerson University
350 Victoria Street, Toronto, Ontario
Canada      M5B 2K3

416-979-5282
_______________________________________________
Co-Editor, 
Space and Culture: An International Journal of Social Spaces
http://www.carleton.ca/space/
> 



More information about the Air-L mailing list