[Air-l] Archiving web sites
Danyel Fisher
danyelf at acm.org
Thu Oct 3 12:32:11 PDT 2002
The canonical software to powerfully archive sites is WGET ("the same thing
the internet archive uses").
It's a command line, which means you might do a little more work up front,
but you can save it as a batch file and do it automatically.
The site for it all is at
http://wget.sunsite.dk/
with documentation at
http://www.gnu.org/manual/wget/
In particular, it has a feature known as "recursive retrieval", in which it
follows links.
Note it has a few specific sections that may be of particular use:
* It can go recursively by typing -r
* By default, it always retrieves from exactly one site
* If you give it
-Ddomain
it retrieves from only that domain.
So, let's say you just want to grab a single web page.
wget http://fly.cc.fer.hr/
does it.
Now, let's say you've typed up a little file with the list of all URLs you
want to get.
wget -i filelist
does it.
Let's say you want the top three levels below http://www.aoir.org
wget -r -l3 http://www.aoir.org
More information about the Air-L
mailing list