[Air-l] Archiving web sites

Danyel Fisher danyelf at acm.org
Fri Oct 4 12:39:35 PDT 2002


Frank Schaap <architext at fragment.nl> wrote...

[WGET]
> need to dig into it a bit further, but it seems almost perfect...

And for everyone else who doesn't like command lines:
check this out:
http://www.jensroesner.de/wgetgui/
Yes, it's ugly. But it lets you click boxes instead of memorizse
letters.

> it however
> doesn't appear to be able to follow and archive pages hidden behind
javascript
> pop-up code. any inside hints or tips about that?

Not in particular, I'm afraid. This is a Known Problem, and from a
social historian's point of view tells a story about the adoption of the
technology.

Originally,  wget was meant for system administrators to back
up their websites. And so it offered a lot of commands of the
form "copy all documents from this site."

Then the search engine spiders got at it, so it got commands
like "figure out how deep we've traversed, and copy it too."

But it hasn't yet really Made It as an end-user tool.

Javascript has a problem that it's a programming language.
It can create URLs on the fly, based on your user name,
my astrological sign, and the date at that very moment.
Which is why the wget authors decided not to deal with
it.

Now, most (but not all) javascript tags use something like
javascript:command( ...., URL, ....) and they hardcode the
command into the text.

In which case,
IF YOU ARE UP TO COMPILING C CODE (!),
see this
http://www.geocrawler.com/archives/3/409/2000/4/100/3543604/






More information about the Air-L mailing list