[Air-L] Wikipedia article edit history extraction tools?

Peter Timusk ptimusk at sympatico.ca
Tue Aug 14 16:02:44 PDT 2012


I am not sure how you will get the demography variables you obviously need.
I use a handle to do my edits on Wikipedia. That's all you see in the edit
history.  Of course some like me may have a male first name in this handle
or a female first name. In my legal studies BA we learned that we had to
cite the first names of scholars because this allowed us to see the gender.
Wikipedia do not know my gender. Unlike some paid web site that may have my
credit card data and access to my gender which they could in turn share with
a researcher I don't think Wikipedia have much real data about me they can

-----Original Message-----
From: air-l-bounces at listserv.aoir.org
[mailto:air-l-bounces at listserv.aoir.org] On Behalf Of Monika Sengul-Jones
Sent: August-14-12 6:39 PM
To: air-l at listserv.aoir.org
Subject: [Air-L] Wikipedia article edit history extraction tools?

Hello Air-L list:

This summer I'm doing research on Wikipedia entries in the field of Science
and Technology Studies. A central question I'm asking is the extent to which
this field, as it is now on Wikipedia, includes/features/references
contributions made by women, feminist theorists, and feminist theory.

To answer this, I'm gathering data on existing pages using a variety of
mixed methods. I would like to ask for recommendations on tools for
extracting the history of editing on a page. I want to see how many times a
given article has been edited, by whom, and what types of edits and content
contributions are made over time. So far, I've found the "history" tool on
the Wikipedia page limited. I cannot see how many edits have been made on a
particular article and understanding what kinds of edits are made (e.g.
grammatical,  content) requires going into each historical page view. I'd
love to find a way to download the history of an article and extract the
data into a spreadsheet -- perhaps this is a tall order.

So far, I've found tools for extracting data on Wikipedia from the Digital
Methods Initiative website (which was first introduced to me by this list
serve! :)). Specifically, the program History Flow is useful to an extent
for visualizing types of content contributions and edits over time. But
there is no way to translate these visualizations into a spreadsheet format
-- as far as I can tell -- so I've been doing that manually, somehow piecing
together the history of edits on an article. Meanwhile, I was recommended a
tool called WikiChecker (
http://en.wikichecker.com/article/?a=science_studies) but the summary format
is limited and, at times, contradictory to data I get elsewhere.

If anyone has any other tools or methods to suggest for ways to collect data
on content contributions and edits on Wikipedia I would be most grateful.

I'd also be happy to be in conversation with anymore interested in the
concept of the project. I'm working on it as a part of the FemTechNet
Initiative, spearheaded by Anne Balsamo and Alexandra Juhasz. I'm not sure
if information on the initiative has circulated here, so I'll paste in a
copy of the "call" which took place last spring. *

Thank you,

Monika Sengul-Jones
Graduate Student
Communication & Science Studies
University of California, San Diego
msengul at ucsd.edu
The Air-L at listserv.aoir.org mailing list is provided by the Association of
Internet Researchers http://aoir.org Subscribe, change options or
unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:

More information about the Air-L mailing list