[Air-l] SW to store webpages

Thomas Koenig T.Koenig at lboro.ac.uk
Sun Jun 5 18:05:03 PDT 2005


elijah wright wrote:

>
>> Maximum complexity is not always the best solution. At the moment, I 
>> cannot see how the alleged greater flexibility of wget would improve 
>> research.  If I want to capture an entire website, then HTTrack seems 
>> to do the job. It seems to do it even more complete than wget: (Funny 
>> languages only!)
>
>
>
> you managed to sidestep my question while simultaneously admitting 
> that you haven't used the tool in question. 
>
I did try wget a couple of years ago (where did I write that I never 
used it?), but found WebCopier more practical than wget. I argued on 
your premises, namely that one might want to use wget, because

>  someone might want the additional flexibility to make their research 
> better? 

I suspect that, in fact, wget's "flexibility" does not improve research, 
and that, possibly, wget isn't even more "flexible" with respect to 
important research goals than HTTrack. Besides wget and WebCopier, I 
also tried out WinHTTrack, and it seems to work well. I didn't make a 
systematic comparison of the three tools, but my links from the previous 
post  (provided the satements made in them are true) suggest that there 
are at least four advantages of HTTrack (and WebCopier) over wget:

1) Better use of system resources (faster)
2) Access to some files obscured by bad links
3) Better handling of dynamic parameters
4) Easier to install and has a more intuitive inteface.

These seem intuitive advantages, since wget is an old UNIX command from 
the times, when no dynamic web pages existed. Maybe newer versions now 
handle parameters better, if that's the case, please say so. I couldn't 
find any evidence that wget is now superior or even en par with either 
of the other two programs with respect to above three criteria.

I glanced over the wget mnanual 
(http://www.delorie.com/gnu/docs/wget/wget.html), and couldn't find any 
options that seem important for site mirroring, not also offered by 
WebCopier (which is the program I am most familiar with). I suspect, 
with HTTrack that's the same.

In fact, there is quite a credible evaluation of 20 free spiders, and 
HTTrack fares pretty well, you could even say better than wget:

http://www.diglib.org/aquifer/oct2504/spidereval.pdf

Thus, I repeat my question: What are the functions wget offers that 
makes it superior to HTTrack or WebCopier?

> that's very interesting social behavior...

> please don't actively troll the AIR list.  it is quite annoying.

I am used to "trolling" allegations on the Usenet (not necessarily 
towards myself, but that also happens), when there is dissent. However, 
there is a difference between voicing "dissenting views" and "trolling". 
All to often on the Usenet trolling allegations silence dissent (less 
likely on academic lists, where many people have big egos). 
Nevertheless, I think, even on an academic list, such ad hominem 
allegations should be made only under extraordinary circumstances. After 
all, the whole business of academia is criticism and counter-criticism 
and there is no need to get personal, when one disagrees (except in very 
few circumstances).

NB: I frequently make provocative statements, because that's the easiest 
way to falsify (my own) wrong assumptions.

Thomas

-- 
thomas koenig, ph.d.
http://www.lboro.ac.uk/research/mmethods/staff/thomas/




More information about the Air-L mailing list