[Air-L] WARC File viewer
Steffen Schilke
steffen.schilke at gmail.com
Thu Feb 18 05:27:09 PST 2010
thank you - but i would still think (and need) a full stand alone viewer
(maybe java) so that i can view (and export content) without having to
install a full wayback machine on a server
kind regards
live from my android mobile
On Feb 18, 2010 2:06 PM, "Johan Oomen" <JOomen at beeldengeluid.nl> wrote:
Good afternoon,
Either use:
the Java API for reading (w)arcs ( import org.archive.io.warc package ->
WARCReader). With this API you should be able to read anything stored in
WARC files.
Or you can use the arcreader present in the heritrix install
(heritrix-x.x.x/bin/arcreader)
I also use boilerpipe to trim pages of unneccesary HTML content.
Best wishes,
Jaap Blom and Johan Oomen
Netherlands Institute for Sound and Vision
(working on the FP7 Living Web Archives project, LiWA)
Op 18 feb 2010 om 01:22 heeft "Robert Ackland" <robert.ackland at anu.edu.au>
het volgende geschreven:\
> These are some WARC-related tools I'm aware of. I doubt there is a viewer
there, though. > > http...
More information about the Air-L
mailing list