[Air-L] Software to extract content of Facebook & Twitter

Harju Anu anu.harju at aalto.fi
Wed Aug 27 11:26:50 PDT 2014


Hi everyone,

and I'm also grateful for all these suggestions for various tools. For a paper for my PhD I'm looking at YouTube comment threads and I was wondering if any one of you might know a tool that can extract those? It's a very laborious process to do manually and it drives me insane. I once asked a coder friend of mine, but he said it was more complicated than he initially thought, and we left it at that.

Thank you in advance, and thanks for a great list! I've been a lurker for quite some time now and find it very useful.

Best,
Anu


Anu Harju
Doctoral Candidate
Aalto University
Helsinki
Finland

Sent from my iPhone

On 27.8.2014, at 18.06, "Tim Libert" <tlibert at asc.upenn.edu> wrote:

> I’d quickly point out two additional considerations when ingesting fb/twitter data:  1) APIs generally exclude ads (which are ‘targeted’) - so depending on what you want to study and/or model an API will never give you an accurate view of what users really see.  APIs are easy, but incomplete.  2) The trick with scraping content directly from the web is accounting for processing/executing javascript as that is how many pages pull content dynamically (there may also be other factors: redirects, iframes, canvas, etc).  If your tool (e.g. Python urllib,etc). can only access static HTML you will not be able to pull the content you want as you will be accessing instruction sets of how to dynamically render content rather than the actual content.  I am not sure how your tool in R works, but I imagine this is a likely issue you may be facing.  I have developed some software that solves problem #2 by leveraging http://phantomjs.org/, but it’s not ready for public release quite yet; however, you may want to consider using an automation framework like selenium (http://www.seleniumhq.org/).
> 
> - tim, phd student, upenn
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list