[Air-L] Heritrix - crawl job for single pages referenced by URL
Steffen Schilke
steffen.schilke at gmail.com
Fri Feb 19 05:20:17 PST 2010
Dear All,
Heritrix: I was reading the manual and I have a little problem
understanding how I can set up a crawl job. My task would be to archive
only certain pages in a crawl job, i.e., I want to give Heritrix a list of
URLs referring to one page each and I want them to be collected (including
all components of that page (e.g., PDF files, images, ...). Anybody here
which could give me a hint / sample job definition?
Thank you very much in advance
.
More information about the Air-L
mailing list