[Air-L] Comment scraping

Michael Trice propeliea at gmail.com
Tue Jan 22 07:05:21 PST 2013


If you have a background in Python, or an interest in learning, the Scrapy
open source solution is cheap and flexible.

http://scrapy.org/



On Tue, Jan 22, 2013 at 7:54 AM, Casey Tesfaye <klt35 at georgetown.edu> wrote:

> Jacob, Jasmine, etc,,
>
> That software looks great! But expensive! I wonder if there is a cheaper
> alternative? (I'm working with the same type of data)
>
> Otherwise, we the multipronged approach has been the best I've encountered:
> text file + screen shot + html file
>
> Thanks,
> Casey
>
>
> On Mon, Jan 21, 2013 at 10:57 PM, Jacob Groshek <jgroshek at gmail.com>
> wrote:
>
> > I highly recommend Discovertext.  http://discovertext.com/
> >
> > Easy to use, good tech support if/when you need it.  Built in coding
> > system.  Also can export to spreadsheet (if necessary) with subscription.
> >
> > Best,
> >
> > Jacob
> >
> > --
> > Dr. Jacob Groshek
> > Assistant (Visiting) Professor
> > Digital Media and Research Methods
> > jgroshek.com <http://www.jgroshek.com/>
> >
> > Head, CTEC <http://aejmcctec.com/> / AEJMC <http://www.aejmc.org/>
> > Visiting Scholar, IAST <http://www.iast.fr/>
> > Full Member, NeSCoR <http://nescor.socsci.uva.nl/>
> >
> >
> >
> > On Tue, Jan 22, 2013 at 2:47 PM, Jasmine E McNealy <jemcneal at syr.edu>
> > wrote:
> >
> > > Hello All,
> > >
> > > I'm looking for ideas on the best software to use for comment scraping.
> >  I
> > > plan on doing quantitative content and qualitative textual analysis on
> > the
> > > comments connected to an article on an online pubulication.  The
> > > publication uses Disqus for comments, and ideally I'd like a program
> that
> > > would maintain the integrity of the comment relationships.  Any and all
> > > ideas are appreciated.
> > >
> > > Thanks,
> > >
> > > JM
> > >
> > > Jasmine McNealy
> > > Assistant Professor
> > > S.I. Newhouse School of Public Communication
> > > Syracuse University
> > > 215 University Place
> > > Syracuse, NY 13210
> > > 315-443-1151
> > > http://ssrn.com/author=1357319
> > > _______________________________________________
> > > The Air-L at listserv.aoir.org mailing list
> > > is provided by the Association of Internet Researchers http://aoir.org
> > > Subscribe, change options or unsubscribe at:
> > > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >
> > > Join the Association of Internet Researchers:
> > > http://www.aoir.org/
> > >
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> >
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



-- 
-- 
Senior Lecturer in Technical Communication at the University of North Texas
Doctoral Candidate at Texas Tech University
Phone: 806.392.7016
Twitter: mikertrice
Skype: mrtrice1
Email: propeliea at gmail.com
“If we knew what it was we were doing, it would not be called research,
would it?” - Albert Einstein



More information about the Air-L mailing list