[Air-l] archiving Google's cache

Tue Nov 19 18:07:53 PST 2002

I have a new take on the old problem of archiving a web site. The problem is
that the site I need to archive has already been taken off line. (An object
lesson in why it¹s important to archive web sites you¹re depending on in
your research....) Fortunately, the site is still available through Google¹s
cache, but this is a difficult way to access the content. (The site in
question is an e-mail list archive, and so each message is a separate page,
which means I need to download more than 2000 pages.)

I¹ve tried various archiving programs, inputting the URL that Google
generates for its search result page as the root page for the archive. But
so far no luck ‹ I think because Google creates a separate URL for each page
of the search results, and because it¹s difficult to figure out the right
³depth² of archive. (I need it to go 200 pages deep to get to the last page
of search results, but I only want it to go 2 pages deep to get each
message.) 

Does anyone have any experience trying to archive a Google cache?  Or any
suggestions?

Thanks,

Alex
-- 
Alexandra Samuel
samuel at fas.harvard.edu
http://www.alexandrasamuel.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.aoir.org/pipermail/air-l-aoir.org/attachments/20021119/02250258/attachment.htm>