[Air-L] Wikipedia Sampling

Alex Halavais alex at halavais.net
Wed Sep 23 10:23:10 PDT 2015


Hi, Josh,

It depends, of course, on what you are sampling *for*. A "constructed
week" is generally based on viewing patterns, and so I suppose you
could use traffic data to oversample the most popular pages. Or focus
on the front page.

The most obvious here is to just randomly sample. In doing so, you
will find a very large number of articles--some of them
autogenerated/imported--that have never been touched.

If you haven't, you might consider copying this question over here as well:

https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

In sum, though, any sampling method that draws on edit histories to
study edit histories is probably a problem--ends up wagging the dog a
bit. I guess you could use this:

https://aws.amazon.com/datasets/wikipedia-page-traffic-statistics/

to sample based on visitors, but that's a dated collection. I'm sure
getting the traffic data from somewhere is a possibility, but seems
like a lot of work to create a "constructed week."

Best,

Alex


On Wed, Sep 23, 2015 at 8:33 AM, Joshua Braun <jabraun at journ.umass.edu> wrote:
> Hi All,
>
> Just a brief question for the list: I'm considering doing a study that looks at the edit histories of a sample of Wikipedia articles, and I'm wondering if there are accepted strategies for assembling a "representative" sample of Wikipedia articles akin to the way that, say, television researchers put together a composite week for content analyses.
>
> Obviously any sampling strategy will come with limitations, upsides, and downsides. I'm mostly curious as to whether there are accepted sampling methods that have emerged in the literature dealing with Wikipedia.
>
> Thanks!
>
> All the Best,
> Josh
> --
> Josh Braun, Ph.D.
> Assistant Professor of Journalism Studies
> Journalism Department
> University of Massachusetts Amherst
>
> @josh_braun
> Skype: wideaperture
> http://wideaperture.net/
>
> "Maybe the only gift is a chance to inquire, to know nothing for certain.  An inheritance of wonder and nothing more."
> William Least Heat-Moon
>
> Sent from Emacs
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/



-- 

// Alexander Halavais, Sociologist, Semiologist, and Saboteur Extraordinaire
// Associate Professor of Social Technologies, Arizona State University
// http://alex.halavais.net/bio     @halavais




More information about the Air-L mailing list