[Air-L] Tool to Download Websites?

Dennis Wollersheim dewoller at gmail.com
Sun Feb 12 22:05:06 PST 2012


Hi Kathleen

Lucene is good, but there are also some simple options. I like the command 
line;  there you can use wget:

http://gnuwin32.sourceforge.net/packages/wget.htm

Usage detailed here:

http://how-to.wikia.com/wiki/How_to_mirror,_spider,_or_archive_a_website
http://blog.moldoveanu.net/2010/11/downloading-an-entire-website-using-wget/


Or you can use a 'spider' extension as part of the firefox webbrowser;

Install firefox,
www.mozilla.org/en-US/firefox/new/

and then, in firefox, install the a spider addon, either

https://addons.mozilla.org/en-US/firefox/addon/spiderzilla/

or

https://addons.mozilla.org/en-US/firefox/addon/foxyspider/

Write back if you have any problems.

Cheers
Dennis


On 02/13/2012 04:48 PM, Wojciech Gryc wrote:
> Hi Kathleen,
>
> Apache Lucene is the best resource for something like this, in my opinion.
> Available here: http://lucene.apache.org/
>
> Requires some programming knowledge though.
>
> Thanks,
> Wojciech
>
>
>
> On Mon, Feb 13, 2012 at 12:33 AM, Kathleen Stansberry
> <kpontius at uoregon.edu>wrote:
>
>> I¹m working on a project that involves conducting a cluster analysis (type
>> of textual analysis based on Kenneth Burke¹s work) on the content of five
>> different websites. I want to download the full content of these five sites
>> so I have hard copies to work from during the rather arduous process of
>> going through and categorizing the text.
>>
>> Can anyone recommend a good program to download full websites (to a page
>> depth of at least 3)? I¹ve been using SiteSucker but am finding it a bit
>> buggy.
>>
>> Thank you!
>> Katie
>>
>> Kathleen Stansberry
>> Ph.D. Candidate
>> University of Oregon
>> School of Journalism and Communication
>> http://katiestansberry.com
>> kpontius at uoregon.edu
>> (541) 228-5576
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at:
>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list