[Air-L] WARC File viewer

Steffen Schilke steffen.schilke at gmail.com
Thu Feb 18 05:27:09 PST 2010


thank you - but i would still think (and need) a full stand alone viewer
(maybe java) so that i can view (and export content) without having to
install a full wayback machine on a server

kind regards

live from my android mobile

On Feb 18, 2010 2:06 PM, "Johan Oomen" <JOomen at beeldengeluid.nl> wrote:

Good afternoon,

Either use:

the Java API for reading (w)arcs ( import org.archive.io.warc package ->
WARCReader). With this API you should be able to read anything stored in
WARC files.

Or you can use the arcreader present in the heritrix install
(heritrix-x.x.x/bin/arcreader)

I also use boilerpipe to trim pages of unneccesary HTML content.

Best wishes,
Jaap Blom and Johan Oomen

Netherlands Institute for Sound and Vision

(working on the FP7 Living Web Archives project, LiWA)

Op 18 feb 2010 om 01:22 heeft "Robert Ackland" <robert.ackland at anu.edu.au>
het volgende geschreven:\

> These are some WARC-related tools I'm aware of.  I doubt there is a viewer
there, though. > > http...



More information about the Air-L mailing list