[Air-L] Using the Archive.org for data capture?

kalev leetaru kalev.leetaru5 at gmail.com
Tue Apr 21 03:10:41 PDT 2015


Dan, see my paper with Tim Perkins and Chris Rewerts from last year - it
was the first to look at the Archive's web archive at scale for content
analysis, providing a template for working for the full 1.7-billion PDF
archive:

http://dlib.org/dlib/september14/leetaru/09leetaru.html

I know they have a great interest in working with scholars in exploring the
web archive.

Stay tuned, there will be a piece coming out in the next few weeks actually
on this very topic, describing the Archive's Virtual Reading Room model
that I've been shaping with Roger MacDonald at the Archive, which is what
enables research on their collections like the TV Archive.

~K

On Mon, Apr 20, 2015 at 4:28 PM, Dan Fielding <sociologyfornerds at gmail.com>
wrote:

> Hello wonderful list,
>
> I am currently establishing a research protocol that will rely on the
> wayback machine (archive.org) to gather caches of pages from 1-2 years
> ago.
> Is there research on the wayback machine as an effective mode of data
> capture? Are there any questions about its validity? Have you read
> published work using the wayback machine? What concerns have other scholars
> raised about using it?
>
> Thanks for your time! Have a great day,
>
> Dan Fielding
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>


More information about the Air-L mailing list