[Air-L] Heritrix - crawl job for single pages referenced by URL

Steffen Schilke steffen.schilke at gmail.com
Fri Feb 19 05:20:17 PST 2010

Dear All,

Heritrix: I was reading the manual and I have a little problem
understanding how I can set up a crawl job. My  task would be to archive
only certain pages in a crawl job, i.e., I want to give Heritrix a list of
URLs referring to one page each and I want them to be collected (including
all components of that page (e.g., PDF files, images, ...). Anybody here
which could give me a hint / sample job definition?

Thank you very much in advance


More information about the Air-L mailing list