[Air-L] Wikipedia article edit history extraction tools?

Paolo Massa paolo at gnuband.org
Thu Aug 16 03:26:04 PDT 2012


Hi Monika and list,
I've helped in creating wikitrip, a web tool displaying an animated
visualization over time of geo-location and gender of Wikipedians who
edited a specific page.
You can search for any page (from any language wikipedia!) and get a
stats of how many edits to this page are made from self-declared male
and female wikipedians, over time (bottom right of the web interface).

Few random examples:
http://sonetlab.fbk.eu/wikitrip/#|en|Feminism The page Feminism
received most gendered edits from males
http://sonetlab.fbk.eu/wikitrip/#|en|Wikipedia_talk:WikiProject_Feminism
but the talk of the Feminism project mainly from females
http://sonetlab.fbk.eu/wikitrip/#|en|Sexual_intercourse
Sexual_intercourse was edited mainly by males in the beginning (2001)
but around 2008 female wikipedians jumped in
http://sonetlab.fbk.eu/wikitrip/#|en|Talk:Sexual_intercourse Similar
thing for the talk page of sexual intercourse.

Wikitrip code is open source so anybody can look at it, improve it and
re-use it.
Moreover we have also released a useful API.
So if you want to get the raw data, you can!
The 3 available APIs are described on the help page at wikitrip (click
the "read more..." link) and they are
api.php: Get various stats about a page (including editors and how
many edits they performed)
api_gender.php: Get timestamp and gender for any edit by a registered
user that specified his gender on a specific page
api_geojson.php: Get location in the world for anonymous edits on a
specific page

2 examples of the first 2 apis (they are described on the wikitrip help page)
http://toolserver.org/~sonet/api.php?article=London&lang=en&editors&max_editors=5
http://toolserver.org/~sonet/api_gender.php?article=London&lang=en
The output format is json but we can easily change it into csv or
anything else, if there is such a request.

Notes:
1) as you might know, expressing your gender on Wikipedia is not
mandatory and few user do it (around 10% last time I checked if I
remember correctly) so stats are heavily biased by this. Still a
Wikitrip exploration can be a beginning for a research, not surely the
end of it ;)
2) we show number of  edits and not number of editors because number
of edits are greater and so stats are more "dramatic" but this adds
another level of "noise" since it might be that a single female
editors, for example, performed 200 edits to a page that technically
it is not receiving a lot of attention from females but from one
female. However, as I wrote earlier, there is the API you can use to
get the raw data (for example, all the gendered edits) and so to
conduct different, less dramatic and more scientific research lines ;)

I'm very interested in counducting research on Wikipedia and gender so
Monika I'll contact you offline for possible collaborations.
Actually I'll present Wikitrip (and Manypedia.com ) in few days at
Wikisym 2012 in Linz, Austria, if you are there, I would love to talk
with you face2face too.

Ciao! ;)


-- 
--
Paolo Massa
Email: paolo AT gnuband DOT org
Blog: http://gnuband.org

On Wed, Aug 15, 2012 at 12:38 AM, Monika Sengul-Jones
<jones.monika at gmail.com> wrote:
> Hello Air-L list:
>
> This summer I'm doing research on Wikipedia entries in the field of Science
> and Technology Studies. A central question I'm asking is the extent to
> which this field, as it is now on Wikipedia, includes/features/references
> contributions made by women, feminist theorists, and feminist theory.
>
> To answer this, I'm gathering data on existing pages using a variety of
> mixed methods. I would like to ask for recommendations on tools for
> extracting the history of editing on a page. I want to see how many times a
> given article has been edited, by whom, and what types of edits and content
> contributions are made over time. So far, I've found the "history" tool on
> the Wikipedia page limited. I cannot see how many edits have been made on a
> particular article and understanding what kinds of edits are made (e.g.
> grammatical,  content) requires going into each historical page view. I'd
> love to find a way to download the history of an article and extract the
> data into a spreadsheet -- perhaps this is a tall order.
>
> So far, I've found tools for extracting data on Wikipedia from the Digital
> Methods Initiative website (which was first introduced to me by this list
> serve! :)). Specifically, the program History Flow is useful to an extent
> for visualizing types of content contributions and edits over time. But
> there is no way to translate these visualizations into a spreadsheet format
> -- as far as I can tell -- so I've been doing that manually, somehow
> piecing together the history of edits on an article. Meanwhile, I was
> recommended a tool called WikiChecker (
> http://en.wikichecker.com/article/?a=science_studies) but the summary
> format is limited and, at times, contradictory to data I get elsewhere.
>
> If anyone has any other tools or methods to suggest for ways to collect
> data on content contributions and edits on Wikipedia I would be most
> grateful.
>
> I'd also be happy to be in conversation with anymore interested in the
> concept of the project. I'm working on it as a part of the FemTechNet
> Initiative, spearheaded by Anne Balsamo and Alexandra Juhasz. I'm not sure
> if information on the initiative has circulated here, so I'll paste in a
> copy of the "call" which took place last spring. *
> http://aljean.files.wordpress.com/2012/05/femtechnet-long-form-invite-may-2012.pdf
> *
>
> Thank you,
> Monika
>
> --
> Monika Sengul-Jones
> Graduate Student
> Communication & Science Studies
> University of California, San Diego
> msengul at ucsd.edu
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list