[Air-L] archiving entire blogs

Wendy Christensen wchriste at bowdoin.edu
Fri Jan 20 06:05:23 PST 2012


Hi All,

I used SiteSucker to download entire blogs (http://www.sitesucker.us/home.html). It worked very well for both blogspot and wordpress blogs, but excess files had to be cleaned up and deleted before analysis.

I ran into issues trying to find content analysis software that would allow me to code html files. If anyone has suggestions for software for qualitative analysis of websites and/or downloaded html files, I'd love to hear about it!

Best, Wendy


Wendy M. Christensen, Ph.D.

Visiting Assistant Professor
Department of Sociology and Anthropology
Bowdoin College
wchriste at bowdoin.edu<mailto:wchriste at bowdoin.edu>



On Jan 20, 2012, at 3:40 AM, Jarkko Moilanen wrote:

hi,

Quoting Stuart Shulman <stuart.shulman at gmail.com<mailto:stuart.shulman at gmail.com>>:

WORDPRESS has a feature for this:

http://en.blog.wordpress.com/2006/06/12/xml-import-export/

If it is a WORDPRESS blog, you can ask the owner to create a bulk export in
XML.


If you are archiving blog that you don't have access to export functions, I would use 'wget'. It contains features to get everything, no matter how deep the structure is.

http://en.wikipedia.org/wiki/Wget

/Jarkko


Better still is the new offering from GNIP:

http://blog.gnip.com/gnip-and-automattic-make-whole-new-universe-of-data-available/

The future is bright for getting big collections.

~Stu

On Thu, Jan 19, 2012 at 9:31 PM, C Sosnowy <c_sosnowy at yahoo.com> wrote:

I would like to be able to archive an entire blog (and ideally be able to
download it) for analysis. I've looked at WebCite and Zotero but neither
seem to have this capability. Does anyone know of another way?


Collette Sosnowy
M.A., Ph.D. Candidate
Environmental Psychology Program
The Graduate Center of the City University of New York
_______________________________________________
The Air-L at listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at:
http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/




--

Dr. Stuart W. Shulman
people.umass.edu/stu

Editor Emeritus, JITP
jitp.net <http://www.jitp.net>

Director, QDAP-UMass
umass.edu/qdap <http://www.umass.edu/qdap>

Founder and CEO, Texifter
texifter.com <http://www.texifter.com>

LinkedIn: linkedin.com/pub/stuart-shulman/10/351/899<http://www.linkedin.com/pub/stuart-shulman/10/351/899>
Twitter: twitter.com/#!/StuartWShulman<http://twitter.com/#%21/StuartWShulman>
_______________________________________________
The Air-L at listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/




****************************
Jarkko Moilanen (+358 45 8877 150)
M.Soc.Sc. (Political Science)
PhD Student, Information studies, University of Tampere
Blog: http://blog.ossoil.com/
-------------------------
Founder of Hackerspace 5w, Finland, Tampere - http://5w.fi
Founder of MeeGo Network Finland, http://meegonetwork.fi
Founder of Open Coral - http://open-coral.org
Founder of Finnish Biohacker community, http://biohakkeri.fi
****************************
_______________________________________________
The Air-L at listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/




More information about the Air-L mailing list