[Air-L] Software to extract content of Facebook & Twitter

Noha Nagi noha.a.nagi at gmail.com
Wed Aug 27 22:57:23 PDT 2014


Hi Anu,

I suggest you try  NodeXL <http://nodexl.codeplex.com/>. It's simple and
free. You will need to install first the social network importer
<http://socialnetimporter.codeplex.com/> for NodeXL to grab facebook,
twitter, flicker and youtube data.

Good Luck !


On Wed, Aug 27, 2014 at 9:26 PM, Harju Anu <anu.harju at aalto.fi> wrote:

> Hi everyone,
>
> and I'm also grateful for all these suggestions for various tools. For a
> paper for my PhD I'm looking at YouTube comment threads and I was wondering
> if any one of you might know a tool that can extract those? It's a very
> laborious process to do manually and it drives me insane. I once asked a
> coder friend of mine, but he said it was more complicated than he initially
> thought, and we left it at that.
>
> Thank you in advance, and thanks for a great list! I've been a lurker for
> quite some time now and find it very useful.
>
> Best,
> Anu
>
>
> Anu Harju
> Doctoral Candidate
> Aalto University
> Helsinki
> Finland
>
> Sent from my iPhone
>
> On 27.8.2014, at 18.06, "Tim Libert" <tlibert at asc.upenn.edu> wrote:
>
> > I’d quickly point out two additional considerations when ingesting
> fb/twitter data:  1) APIs generally exclude ads (which are ‘targeted’) - so
> depending on what you want to study and/or model an API will never give you
> an accurate view of what users really see.  APIs are easy, but incomplete.
> 2) The trick with scraping content directly from the web is accounting for
> processing/executing javascript as that is how many pages pull content
> dynamically (there may also be other factors: redirects, iframes, canvas,
> etc).  If your tool (e.g. Python urllib,etc). can only access static HTML
> you will not be able to pull the content you want as you will be accessing
> instruction sets of how to dynamically render content rather than the
> actual content.  I am not sure how your tool in R works, but I imagine this
> is a likely issue you may be facing.  I have developed some software that
> solves problem #2 by leveraging http://phantomjs.org/, but it’s not ready
> for public release quite yet; however, you may want to consider using an
> automation framework like selenium (http://www.seleniumhq.org/).
> >
> > - tim, phd student, upenn
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



-- 
*Noha A.Nagi*



More information about the Air-L mailing list