[Air-l] counting google hits
Greg Elmer
gelmer at ryerson.ca
Thu Mar 3 06:25:14 PST 2005
Thomas,
Of course all samples have biases, but isn't it incumbent upon us to better understand them -- particularly in the case of Google, precisely because it is fast becoming "all that we have", that is a default choice for information retrieval, not only for everyday users, but also for students and Internet researchers?
Greg Elmer
----- Original Message -----
From: Thomas Koenig <T.Koenig at lboro.ac.uk>
Date: Wednesday, March 2, 2005 8:54 pm
Subject: Re: [Air-l] counting google hits
> Citeren elijah wright <elw at stderr.org>:
>
> [What's wrong with using Google stats?]
> > because people assume that all texts that are available are
> represented,> which according to the google people they are *not*.
>
> Fair enough, but what is your alternative corpus? Most traditional
> corporahave a bias away from everyday language to journalistic
> and/or literary
> writings. Sometimes these bias' may not matter, some other times, they
> might be even desirable, but at times google is the better choice,
> even if
> imperfect.
>
> > in other words, the sample that you are pulling numbers from is
> neither> complete nor perfect - so your results won't be either.
>
> Who gets unbiased random samples? No-one, not even NORC, who are
> pretty good
> at it. Does that invalidate *all* statistical results? Of course
> not. Don't
> get me wrong, I am all for careful random sampling, but if I cannot
> get it,
> I might, under some circumstances, resort to biased samples, rather
> than to
> not get any sample at all.
>
> > do you understand what google does well enough (details of the
> algorithm,> et cetera) to know what the weaknesses are? oh, you
> say they haven't
> > published enough information for you to know? that's what i
> thought. :|
>
> I do not know, how google indexes (I have a faint idea, though),
> but for
> many practical purposes, it simply does not matter, as long as I do
> notsuspect a bias of exclusions of websites, which are *systematically
> related* to the topic I am researching.
>
> Would I rather have a random sample of all human-generated websits,
> preferably with the vital stats of their authors attached? You bet.
> I just
> won't get it. So I am taking the next best thing, aka Google.
>
> > > I am afraid, this is how your argumentation sounds to me. Why
> should it
> > > be wrong to use the number of google hits under all circumstances?
> >
> > i think your tone is pretty crass.
>
> Funny, that's what I thought of yours, that's why I chose to use
> *your*words. You probably know that it's sometimes difficult to
> discern the tone
> when you have no cues other then some ASCII strings.
>
> > > If I want to show that Canada is better known than Vanuatu
> > >
> >
> (http://googlefight.com/index.php?lang=en_GB&word1=canada&word2=vanuatu),> > why would the comparison of google hits be inadmissable? (There are a
> > > number of reasons, why the "Vunuatu" hits are inflated, but
> that is of
> > > no concern here).
> >
> > popularity of a term is one of the few instances in which
> comparative> occurrence vis a vis the google corpus *might* be
> useful. it would
> > depend
> > on your question, and whether the data available from the particular
> > google server you're connected to is appropriate to answering it.
>
> Of course, it always depends on what you want to do, but that's a far
> stretch of your wholesale rejection of using Google hits for any
> kind of
> research:
>
> "folks realize that using the "number of hits returned on google"
> is a
> hilarious bad way to prove a point -- right?"
>
> Thomas
>
> --
> thomas koenig, ph.d.
> department of social sciences, loughborough university, u.k.
> http://www.lboro.ac.uk/research/mmethods/staff/thomas/index.html
Greg Elmer, PhD
Bell Globemedia Research Chair
Rogers Communications Centre/School of Radio-TV Arts
Ryerson University
350 Victoria Street, Toronto, Ontario
Canada M5B 2K3
416-979-5282
_______________________________________________
Co-Editor,
Space and Culture: An International Journal of Social Spaces
http://www.carleton.ca/space/
>
More information about the Air-L
mailing list