[Air-L] Using the Archive.org for data capture?

Anat Ben-David anatbd at gmail.com
Mon Apr 20 14:08:56 PDT 2015


Dear Dan,

Below please find a list of studies that used data captured from archive.org
:
 John, N. A. (2012). Sharing and Web 2.0: The emergence of a keyword. *New
Media & Society*. http://doi.org/10.1177/1461444812450684
 Murphy, J., Hashim, N. H., & O’Connor, P. (2007). Take me back: validating
the Wayback Machine. *Journal of Computer-Mediated Communication*, *13*(1),
60–75.
 Weltevrede, E., & Helmond, A. (2012). Where do bloggers blog? Platform
transitions within the historical Dutch blogosphere. *First Monday*, *17*(2-6).
Retrieved from
http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/3775

On methodological challenges:
Brügger, N. (2010). *Web history*. Peter Lang Pub Incorporated.
 Brügger, N. (2012). Historical Network Analysis of the Web. *Social
Science Computer Review*. Retrieved from
http://ssc.sagepub.com/content/early/2012/09/06/0894439312454267.abstract
Rogers, R. (2013). The Website as Archived Object. In *Digital Methods*
(pp. 61–82). MIT press Cambridge, MA.
Ben-David, Anat, & Huurdeman, Hugo C. (2014). Web Archive Search as
Research: Methodological and Theoretical Implications. *Alexandria*, *25*
(1).
 Best wishes,
anat

On Mon, Apr 20, 2015 at 11:54 PM, Matthew Weber <matthew.weber at rutgers.edu>
wrote:

> Dan,
>
> Rogers’ digital methods work is a broad starting point, although I’m not
> sure that he’s specifically addressed issues with the Internet Archive.
>
> I’ve been working on research derived from the Internet Archive for almost
> a decade now, mostly at a large scale, although some projects are smaller
> in nature. One starting point might be this paper
> http://dl.acm.org/citation.cfm?id=2579213 <
> http://dl.acm.org/citation.cfm?id=2579213>  and I have some other
> published work using derived datasets.
>
> With regards to your question about validity, it depends in part on what
> you’re looking to explore. If you’re using smaller datasets, validity won’t
> be too much of an issue, but once you scale beyond a few dozen domains (and
> again, depending on your analysis and RQs) there are validity issues that
> must be addressed. We’ve started to outline these in a few related papers
> that are under review but mostly it pertains to issues of sampling error
> and data completeness.
>
> Feel free to ping me offline - I can point you to GitHub code and other
> work, depending on your goals - and definitely check out the work of
> others. Kalev Letaaru is active on here and works in this space, as does
> Neils Brugger at Aarhus. There is a growing community of researchers doing
> Internet Archive-related research.
>
> Regards,
> Matt
>
>
>
>
>
> > On Apr 20, 2015, at 4:46 PM, Matthew T Mccarthy <mccart74 at uwm.edu>
> wrote:
> >
> > Apologies for the curt message. I hit send before finishing.
> > In addition to the citation for his book, here is a link to the Ditigal
> Methods Initiative wikipage
> >
> > https://wiki.digitalmethods.net
> >
> > Best,
> > Matt
> >
> >
> > Matthew T. McCarthy
> > Ph.D. Student/Graduate Instructor
> > Department of Sociology
> > University of Wisconsin-Milwaukee
> > P.O. Box 413
> > Milwaukee, WI   53201
> >
> > ________________________________________
> > From: Air-L <air-l-bounces at listserv.aoir.org> on behalf of Matthew T
> Mccarthy <mccart74 at uwm.edu>
> > Sent: Monday, April 20, 2015 3:43 PM
> > To: Dan Fielding; Air-L at listserv.aoir.org
> > Subject: Re: [Air-L] Using the Archive.org for data capture?
> >
> > Dan,
> >
> > Richard Rogers of the Digital Methods Initiative has dealt with this.
> >
> >
> > Rogers, R. (2013). Digital methods. MIT press.
> >
> >
> > Matthew T. McCarthy
> > Ph.D. Student/Graduate Instructor
> > Department of Sociology
> > University of Wisconsin-Milwaukee
> > P.O. Box 413
> > Milwaukee, WI   53201
> >
> > ________________________________________
> > From: Air-L <air-l-bounces at listserv.aoir.org> on behalf of Dan Fielding
> <sociologyfornerds at gmail.com>
> > Sent: Monday, April 20, 2015 3:28 PM
> > To: Air-L at listserv.aoir.org
> > Subject: [Air-L] Using the Archive.org for data capture?
> >
> > Hello wonderful list,
> >
> > I am currently establishing a research protocol that will rely on the
> > wayback machine (archive.org) to gather caches of pages from 1-2 years
> ago.
> > Is there research on the wayback machine as an effective mode of data
> > capture? Are there any questions about its validity? Have you read
> > published work using the wayback machine? What concerns have other
> scholars
> > raised about using it?
> >
> > Thanks for your time! Have a great day,
> >
> > Dan Fielding
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



-- 
--
Anat Ben-David, PhD


Department of Sociology, Political Science, and Communication
The Open University
1 University Road, P.O.Box 808, Ra'anana 43537, Israel

Tel: +972-9-778-1147
Twitter: @anatbd


More information about the Air-L mailing list