[Air-L] Using the Archive.org for data capture?

Matthew Weber matthew.weber at rutgers.edu
Mon Apr 20 13:54:38 PDT 2015


Dan,

Rogers’ digital methods work is a broad starting point, although I’m not sure that he’s specifically addressed issues with the Internet Archive. 

I’ve been working on research derived from the Internet Archive for almost a decade now, mostly at a large scale, although some projects are smaller in nature. One starting point might be this paper http://dl.acm.org/citation.cfm?id=2579213 <http://dl.acm.org/citation.cfm?id=2579213>  and I have some other published work using derived datasets.

With regards to your question about validity, it depends in part on what you’re looking to explore. If you’re using smaller datasets, validity won’t be too much of an issue, but once you scale beyond a few dozen domains (and again, depending on your analysis and RQs) there are validity issues that must be addressed. We’ve started to outline these in a few related papers that are under review but mostly it pertains to issues of sampling error and data completeness.

Feel free to ping me offline - I can point you to GitHub code and other work, depending on your goals - and definitely check out the work of others. Kalev Letaaru is active on here and works in this space, as does Neils Brugger at Aarhus. There is a growing community of researchers doing Internet Archive-related research.

Regards,
Matt





> On Apr 20, 2015, at 4:46 PM, Matthew T Mccarthy <mccart74 at uwm.edu> wrote:
> 
> Apologies for the curt message. I hit send before finishing. 
> In addition to the citation for his book, here is a link to the Ditigal Methods Initiative wikipage
> 
> https://wiki.digitalmethods.net
> 
> Best, 
> Matt
> 
> 
> Matthew T. McCarthy
> Ph.D. Student/Graduate Instructor
> Department of Sociology
> University of Wisconsin-Milwaukee
> P.O. Box 413
> Milwaukee, WI   53201
> 
> ________________________________________
> From: Air-L <air-l-bounces at listserv.aoir.org> on behalf of Matthew T Mccarthy <mccart74 at uwm.edu>
> Sent: Monday, April 20, 2015 3:43 PM
> To: Dan Fielding; Air-L at listserv.aoir.org
> Subject: Re: [Air-L] Using the Archive.org for data capture?
> 
> Dan,
> 
> Richard Rogers of the Digital Methods Initiative has dealt with this.
> 
> 
> Rogers, R. (2013). Digital methods. MIT press.
> 
> 
> Matthew T. McCarthy
> Ph.D. Student/Graduate Instructor
> Department of Sociology
> University of Wisconsin-Milwaukee
> P.O. Box 413
> Milwaukee, WI   53201
> 
> ________________________________________
> From: Air-L <air-l-bounces at listserv.aoir.org> on behalf of Dan Fielding <sociologyfornerds at gmail.com>
> Sent: Monday, April 20, 2015 3:28 PM
> To: Air-L at listserv.aoir.org
> Subject: [Air-L] Using the Archive.org for data capture?
> 
> Hello wonderful list,
> 
> I am currently establishing a research protocol that will rely on the
> wayback machine (archive.org) to gather caches of pages from 1-2 years ago.
> Is there research on the wayback machine as an effective mode of data
> capture? Are there any questions about its validity? Have you read
> published work using the wayback machine? What concerns have other scholars
> raised about using it?
> 
> Thanks for your time! Have a great day,
> 
> Dan Fielding
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list