[Air-L] WARC File viewer

Robert Ackland robert.ackland at anu.edu.au
Wed Feb 17 16:22:43 PST 2010


These are some WARC-related tools I'm aware of.  I doubt there is a 
viewer there, though.

http://code.google.com/p/warc-tools/
http://code.google.com/p/search-tools/

While we're on the topic of WARCs, does anyone know of an open source 
utility for programmatically extracting data from WARCs?  e.g. for 
're-crawling' web pages stored in WARC format so as to extract 
hyperlinks and text content (e.g. meta tags)?

Rob

-------------------------------------
Dr Robert Ackland
The Australian National University

e-mail:   robert.ackland at anu.edu.au
homepage: http://adsri.anu.edu.au/people/robert.php
project:  http://voson.anu.edu.au

Information about the Master of Social Research
(Social Science of the Internet specialisation):
http://adsri.anu.edu.au/study/msr.php
-------------------------------------


Baden Hughes wrote:
> WARC's are a standard web archiving file format
> (http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml); its
> an open standard.
> 
> Usually you would use a web archiving tool like Wayback Machine or the
> underlying open source software (the Heretrix web crawler to collect
> web content, the NutchWAX indexing engine to provide search services,
> and Wayback to provide the user interfaces), or a service from
> Archive-IT (subscription to custom web archiving service -
> www.archive-it.org) to view these files.
> 
> I don;t know of a specific viewer for WARCs.
> 
> Baden
> 
> 
> On Thu, Feb 18, 2010 at 10:06 AM, Steffen Schilke
> <steffen.schilke at gmail.com> wrote:
>> Dear *,
>>
>> could you kindly recommend me a viewer for WARC files (web page archiving).
>>
>> Kind regards
>>
>>
>> .
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list