[Air-L] Using the Archive.org for data capture?

kalev leetaru kalev.leetaru5 at gmail.com
Tue Apr 21 11:37:11 PDT 2015


I should also add that my opening keynote for the 2012 General Assembly of
the IIPC lays out a number of the challenges of working with large web
archives, ranging from nondeterministic recrawl intervals through
incomplete captures (though I should add that the Internet Archive has been
spectacular in addressing many of these in the last three years):

http://blogs.loc.gov/digitalpreservation/2012/05/a-vision-of-the-role-and-future-of-web-archives-the-web-archive-in-todays-world/

http://blogs.loc.gov/digitalpreservation/2012/05/a-vision-of-the-role-and-future-of-web-archives-research-use/

http://blogs.loc.gov/digitalpreservation/2012/05/a-vision-of-the-role-and-future-of-web-archives-conclusions-and-the-role-of-archives/

~K



On Tue, Apr 21, 2015 at 6:10 AM, kalev leetaru <kalev.leetaru5 at gmail.com>
wrote:

> Dan, see my paper with Tim Perkins and Chris Rewerts from last year - it
> was the first to look at the Archive's web archive at scale for content
> analysis, providing a template for working for the full 1.7-billion PDF
> archive:
>
> http://dlib.org/dlib/september14/leetaru/09leetaru.html
>
> I know they have a great interest in working with scholars in exploring
> the web archive.
>
> Stay tuned, there will be a piece coming out in the next few weeks
> actually on this very topic, describing the Archive's Virtual Reading Room
> model that I've been shaping with Roger MacDonald at the Archive, which is
> what enables research on their collections like the TV Archive.
>
> ~K
>
> On Mon, Apr 20, 2015 at 4:28 PM, Dan Fielding <sociologyfornerds at gmail.com
> > wrote:
>
>> Hello wonderful list,
>>
>> I am currently establishing a research protocol that will rely on the
>> wayback machine (archive.org) to gather caches of pages from 1-2 years
>> ago.
>> Is there research on the wayback machine as an effective mode of data
>> capture? Are there any questions about its validity? Have you read
>> published work using the wayback machine? What concerns have other
>> scholars
>> raised about using it?
>>
>> Thanks for your time! Have a great day,
>>
>> Dan Fielding
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at:
>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>>
>
>


More information about the Air-L mailing list