[Air-l] SW to store webpages
Thomas Koenig
T.Koenig at lboro.ac.uk
Sun Jun 5 18:05:03 PDT 2005
elijah wright wrote:
>
>> Maximum complexity is not always the best solution. At the moment, I
>> cannot see how the alleged greater flexibility of wget would improve
>> research. If I want to capture an entire website, then HTTrack seems
>> to do the job. It seems to do it even more complete than wget: (Funny
>> languages only!)
>
>
>
> you managed to sidestep my question while simultaneously admitting
> that you haven't used the tool in question.
>
I did try wget a couple of years ago (where did I write that I never
used it?), but found WebCopier more practical than wget. I argued on
your premises, namely that one might want to use wget, because
> someone might want the additional flexibility to make their research
> better?
I suspect that, in fact, wget's "flexibility" does not improve research,
and that, possibly, wget isn't even more "flexible" with respect to
important research goals than HTTrack. Besides wget and WebCopier, I
also tried out WinHTTrack, and it seems to work well. I didn't make a
systematic comparison of the three tools, but my links from the previous
post (provided the satements made in them are true) suggest that there
are at least four advantages of HTTrack (and WebCopier) over wget:
1) Better use of system resources (faster)
2) Access to some files obscured by bad links
3) Better handling of dynamic parameters
4) Easier to install and has a more intuitive inteface.
These seem intuitive advantages, since wget is an old UNIX command from
the times, when no dynamic web pages existed. Maybe newer versions now
handle parameters better, if that's the case, please say so. I couldn't
find any evidence that wget is now superior or even en par with either
of the other two programs with respect to above three criteria.
I glanced over the wget mnanual
(http://www.delorie.com/gnu/docs/wget/wget.html), and couldn't find any
options that seem important for site mirroring, not also offered by
WebCopier (which is the program I am most familiar with). I suspect,
with HTTrack that's the same.
In fact, there is quite a credible evaluation of 20 free spiders, and
HTTrack fares pretty well, you could even say better than wget:
http://www.diglib.org/aquifer/oct2504/spidereval.pdf
Thus, I repeat my question: What are the functions wget offers that
makes it superior to HTTrack or WebCopier?
> that's very interesting social behavior...
> please don't actively troll the AIR list. it is quite annoying.
I am used to "trolling" allegations on the Usenet (not necessarily
towards myself, but that also happens), when there is dissent. However,
there is a difference between voicing "dissenting views" and "trolling".
All to often on the Usenet trolling allegations silence dissent (less
likely on academic lists, where many people have big egos).
Nevertheless, I think, even on an academic list, such ad hominem
allegations should be made only under extraordinary circumstances. After
all, the whole business of academia is criticism and counter-criticism
and there is no need to get personal, when one disagrees (except in very
few circumstances).
NB: I frequently make provocative statements, because that's the easiest
way to falsify (my own) wrong assumptions.
Thomas
--
thomas koenig, ph.d.
http://www.lboro.ac.uk/research/mmethods/staff/thomas/
More information about the Air-L
mailing list