<HTML>


<HEAD>


<TITLE>archiving Google's cache</TITLE>


</HEAD>


<BODY>


<FONT FACE="Helvetica">I have a new take on the old problem of archiving a web site. The problem is that the site I need to archive has already been taken off line. (An object lesson in why it’s important to archive web sites you’re depending on in your research....) Fortunately, the site is still available through Google’s cache, but this is a difficult way to access the content. (The site in question is an e-mail list archive, and so each message is a separate page, which means I need to download more than 2000 pages.) <BR>


<BR>


I’ve tried various archiving programs, inputting the URL that Google generates for its search result page as the root page for the archive. But so far no luck — I think because Google creates a separate URL for each page of the search results, and because it’s difficult to figure out the right “depth” of archive. (I need it to go 200 pages deep to get to the last page of search results, but I only want it to go 2 pages deep to get each message.) <BR>


<BR>


Does anyone have any experience trying to archive a Google cache?  Or any suggestions?<BR>


<BR>


Thanks,<BR>


<BR>


Alex<BR>


-- <BR>


Alexandra Samuel<BR>


samuel@fas.harvard.edu<BR>


http://www.alexandrasamuel.com<BR>


</FONT>


</BODY>


</HTML>