[Air-L] the best way to archive web material?

Sari angyjoe at gmail.com
Thu Oct 21 15:33:06 PDT 2010


I've been using HTTrack http://www.httrack.com/ (suggested by Jeremy) for a
while…Unfortunately, it breaks the crawling process at the very beginning
sometimes. Am not sure why it does so, but I suppose it is related to the
structure of the website or the portion of the website you are trying to
download for offline browsing.



I've switched to the HTML Spider in the Free Download Manager
http://www.freedownloadmanager.org/ and haven't faced any problem since
then. Adjusting the crawling settings in this spider (depth, in(ex)cluding
images, in(ex)cluding files, etc) is much easier than adjusting them in
HTTrack.



In an early release I used, HTTrack was silently fetching the whole Yahoo to
me =)



/Sari


On Thu, Oct 21, 2010 at 11:16 PM, WL Wong <wwon8281 at uni.sydney.edu.au>wrote:

> Sarah
>
> I use WebCite http://www.webcitation.org/ and Evernote
> http://www.evernote.com/.
>
> Cheers
> WL
> On 22/10/2010, at 1:33 AM, Adi Kuntsman wrote:
>
> > Dear Sarah
> >
> > I am using zotero which is a free add on to Firefox
> http://www.zotero.org/
> > Good thing about it: it takes captures of webpages as they are at any
> particular
> > moment + creates info on URL, date of access etc (Zotero was originaly
> developed
> > as a tool to create and share bibliographies)
> > Files are easy to organise into folders and subfolders, and I think there
> is an
> > option to have your archive stored on zotero site , to be able to share
> (haven't
> > explored this as I work along on my project)
> >
> > Not so good thing: can't download videos. So you will need to download
> > separately.
> >
> > I am sure there are other, better ways, so look forward to other
> responses
> > Adi
> >
> > --
> >
> > Dr. Adi Kuntsman
> > Leverhulme Early Career Fellow
> > Research Institute for Cosmopolitan Cultures
> > The University of Manchester
> > Second Floor, Arthur Lewis Building, room 2.007
> > Oxford Road, Manchester M13 9PL, UK
> > http://www.socialsciences.manchester.ac.uk/ricc/index.html
> > http://adi.kuntsman.googlepages.com
> >
> >
> > ________________________________
> > From: Sarah Oates <s.oates at lbss.gla.ac.uk>
> > To: air-l at listserv.aoir.org
> > Sent: Thu, October 21, 2010 3:25:47 PM
> > Subject: [Air-L] the best way to archive web material?
> >
> > Hello and apologies if this has been asked recently or seems a bit basic!
> >
> > Does anyone have a recommendation for software to archive web material? I
> am
> > heading a project to study political activism on the Russian internet and
> we
> > need to store a range of different types of web pages across time ... I
> can't
> > even get my PC to store even a small amount with full images. My research
> > partner in Ukraine can, but she has a Mac (not an option available at my
> > university right now). I have a small budget to buy some software,
> although
> > freeware suggestions always appreciated. I want to have the archive
> complete so
> > that we can work with it, share it with other researchers, go back to it
> as
> > necessary, etc., so I really want to have full graphics etc. Optimally,
> it would
> > be something that could do automatic crawls and downloads as well,
> although as
> > we are tending to focus on relatively short periods of intense interest
> around
> > particularly issues/events, we don't need a long-term crawl system.
> >
> >
> > Suggestions from this clever and useful list most welcome, although
> currently
> > this list is making me sad that I am not in Sweden to meet people at
> exciting
> > venues and hear what I am sure is some great work (:
> >
> >
> > Sincerely
> > Sarah
> >
> > Sarah Oates
> > Professor of Political Communication
> > School of Social and Political Sciences
> > Adam Smith Building
> > University of Glasgow
> > Glasgow G12 8RT
> >
> > Email: sarah.oates at glasgow.ac.uk
> > Website: www.media-politics.com<http://www.media-politics.com/>
> > Telephone: (0)141 330 5124
> > The University of Glasgow, charity number SC004401
> >
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> >
> >
> >
> >
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



-- 
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: PGP Desktop 9.5.0 (Build 1202)

mQCNBEgtLgoBBACqQYBgYCY40SblWGbTcrvwCngPrjx2CNtcfR/ATvZ4mbF/xHgy
SzV6+XRs76hgAv0K2AG+i4UjDwRRJfb8HPe8DVtsyOQNPFtZO9Gk700aD7MndwlF
m7HrGwc5uBfnH6iUws1o/Z1J7i+5fUfk3mew/b3532WxLvDi+QUSxlsKdQARAQAB
tCRTYXJpIEhhaiBIdXNzZWluIDxhbmd5am9vQHlhaG9vLmNvbT6JAPIEEAECAFwF
AkgtL4UwFIAAAAAAIAAHcHJlZmVycmVkLWVtYWlsLWVuY29kaW5nQHBncC5jb21w
Z3BtaW1lCAsJBwgDAgEKAhkBBRsDAAAABBYDAgEFHgEAAAAHFQgCCgkDAQAKCRCy
i48IPBmZbZoNA/0ckC3rWxoe/Jf66+YauicNtH8zZmr9Y7dypV+yZm/vrkAtffcY
1VKMhj9YMpqwzylP/nomuG211bWoGhMzAb7CAho1tS3KXtUNZzLj1U5hvRtWfrWc
dipwY3YJbnaFdkzIi9xj3HMZ4BKHQZtBKjwru6HafQF2smokS8yjxTKELA==
=9/vk
-----END PGP PUBLIC KEY BLOCK-----



More information about the Air-L mailing list