[Air-L] archiving entire blogs

יוחנן ועקנין dataneto at dataneto.com
Fri Jan 20 06:13:55 PST 2012


*Hi All,
*
I use Web content extractor from Newprosoft (
http://www.newprosoft.com/web-content-extractor.htm). The good thing is
that I can extract  specific fields (like date, name, user or all the
post), add some manipulations thru javascript code and export all the
dataset to Excel.

*Regards,

Yohanan Ouaknine

MA student, Information studies (Knowledge management), Bar Ilan University
(Israel) *





On Fri, Jan 20, 2012 at 6:05 AM, Wendy Christensen <wchriste at bowdoin.edu>
wrote:
>
> Hi All,
>
> I used SiteSucker to download entire blogs (
http://www.sitesucker.us/home.html). It worked very well for both blogspot
and wordpress blogs, but excess files had to be cleaned up and deleted
before analysis.
>
> I ran into issues trying to find content analysis software that would
allow me to code html files. If anyone has suggestions for software for
qualitative analysis of websites and/or downloaded html files, I'd love to
hear about it!
>
> Best, Wendy
>
>
> Wendy M. Christensen, Ph.D.
>
> Visiting Assistant Professor
> Department of Sociology and Anthropology
> Bowdoin College
> wchriste at bowdoin.edu<mailto:wchriste at bowdoin.edu>
>
>
>
> On Jan 20, 2012, at 3:40 AM, Jarkko Moilanen wrote:
>
> hi,
>
> Quoting Stuart Shulman <stuart.shulman at gmail.com<mailto:
stuart.shulman at gmail.com>>:
>
> WORDPRESS has a feature for this:
>
> http://en.blog.wordpress.com/2006/06/12/xml-import-export/
>
> If it is a WORDPRESS blog, you can ask the owner to create a bulk export
in
> XML.
>
>
> If you are archiving blog that you don't have access to export functions,
I would use 'wget'. It contains features to get everything, no matter how
deep the structure is.
>
> http://en.wikipedia.org/wiki/Wget
>
> /Jarkko
>
>
> Better still is the new offering from GNIP:
>
>
http://blog.gnip.com/gnip-and-automattic-make-whole-new-universe-of-data-available/
>
> The future is bright for getting big collections.
>
> ~Stu
>
> On Thu, Jan 19, 2012 at 9:31 PM, C Sosnowy <c_sosnowy at yahoo.com> wrote:
>
> I would like to be able to archive an entire blog (and ideally be able to
> download it) for analysis. I've looked at WebCite and Zotero but neither
> seem to have this capability. Does anyone know of another way?
>
>
> Collette Sosnowy
> M.A., Ph.D. Candidate
> Environmental Psychology Program
> The Graduate Center of the City University of New York
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>
>
>
>
> --
>
> Dr. Stuart W. Shulman
> people.umass.edu/stu
>
> Editor Emeritus, JITP
> jitp.net <http://www.jitp.net>
>
> Director, QDAP-UMass
> umass.edu/qdap <http://www.umass.edu/qdap>
>
> Founder and CEO, Texifter
> texifter.com <http://www.texifter.com>
>
> LinkedIn: linkedin.com/pub/stuart-shulman/10/351/899<
http://www.linkedin.com/pub/stuart-shulman/10/351/899>
> Twitter: twitter.com/#!/StuartWShulman<
http://twitter.com/#%21/StuartWShulman>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>
>
>
>
> ****************************
> Jarkko Moilanen (+358 45 8877 150)
> M.Soc.Sc. (Political Science)
> PhD Student, Information studies, University of Tampere
> Blog: http://blog.ossoil.com/
> -------------------------
> Founder of Hackerspace 5w, Finland, Tampere - http://5w.fi
> Founder of MeeGo Network Finland, http://meegonetwork.fi
> Founder of Open Coral - http://open-coral.org
> Founder of Finnish Biohacker community, http://biohakkeri.fi
> ****************************
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/




--
יוחנן ועקנין
Yohanan Ouaknine


050-6279777
yohanan.ouaknine at ois.co.il
http://il.linkedin.com/in/yohananouaknine


See who we know in common



More information about the Air-L mailing list