[Air-L] WARC File viewer
robert.ackland at anu.edu.au
Wed Feb 17 16:22:43 PST 2010
These are some WARC-related tools I'm aware of. I doubt there is a
viewer there, though.
While we're on the topic of WARCs, does anyone know of an open source
utility for programmatically extracting data from WARCs? e.g. for
're-crawling' web pages stored in WARC format so as to extract
hyperlinks and text content (e.g. meta tags)?
Dr Robert Ackland
The Australian National University
e-mail: robert.ackland at anu.edu.au
Information about the Master of Social Research
(Social Science of the Internet specialisation):
Baden Hughes wrote:
> WARC's are a standard web archiving file format
> (http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml); its
> an open standard.
> Usually you would use a web archiving tool like Wayback Machine or the
> underlying open source software (the Heretrix web crawler to collect
> web content, the NutchWAX indexing engine to provide search services,
> and Wayback to provide the user interfaces), or a service from
> Archive-IT (subscription to custom web archiving service -
> www.archive-it.org) to view these files.
> I don;t know of a specific viewer for WARCs.
> On Thu, Feb 18, 2010 at 10:06 AM, Steffen Schilke
> <steffen.schilke at gmail.com> wrote:
>> Dear *,
>> could you kindly recommend me a viewer for WARC files (web page archiving).
>> Kind regards
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> Join the Association of Internet Researchers:
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> Join the Association of Internet Researchers:
More information about the Air-L