[Air-L] Tool to Download Websites?

Donnelly, Karen k.donnelly at lancaster.ac.uk
Mon Feb 13 02:32:16 PST 2012


Hi Kathleen
I have used both BootCat http://bootcat.sslmit.unibo.it/
and HTtrack 
www.httrack.com for building corpora of websites for textual analysis. They were recommended to me by colleagues in my department.

They can be a bit slow on larger sites but I found them both user friendly and effective. You can also set the search to exclude certain file types i.e. image files if you just want text.
Let me know if you want any further info

Karen
________________________________________
From: air-l-bounces at listserv.aoir.org [air-l-bounces at listserv.aoir.org] on behalf of Wojciech Gryc [wojciech at gmail.com]
Sent: 13 February 2012 05:48
To: Kathleen Stansberry
Cc: air-l at listserv.aoir.org
Subject: Re: [Air-L] Tool to Download Websites?

Hi Kathleen,

Apache Lucene is the best resource for something like this, in my opinion.
Available here: http://lucene.apache.org/

Requires some programming knowledge though.

Thanks,
Wojciech



On Mon, Feb 13, 2012 at 12:33 AM, Kathleen Stansberry
<kpontius at uoregon.edu>wrote:

> I¹m working on a project that involves conducting a cluster analysis (type
> of textual analysis based on Kenneth Burke¹s work) on the content of five
> different websites. I want to download the full content of these five sites
> so I have hard copies to work from during the rather arduous process of
> going through and categorizing the text.
>
> Can anyone recommend a good program to download full websites (to a page
> depth of at least 3)? I¹ve been using SiteSucker but am finding it a bit
> buggy.
>
> Thank you!
> Katie
>
> Kathleen Stansberry
> Ph.D. Candidate
> University of Oregon
> School of Journalism and Communication
> http://katiestansberry.com
> kpontius at uoregon.edu
> (541) 228-5576
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>
_______________________________________________
The Air-L at listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/


More information about the Air-L mailing list