[Air-L] scraping google discussion groups?

Andrew Schrock aschrock at usc.edu
Thu Oct 6 09:13:31 PDT 2011


Thanks Yohanan, Shawn, Jeremy and Deen for your helpful suggestions. I had been thinking too much about custom coding using their API and not enough about using existing scraping software. 

best
Andrew


On Oct 5, 2011, at 11:40 AM, יוחנן ועקנין wrote:

> Hello Andrew. 
> I use Web Content Extractor from newprosoft.com in my research and it works quite good. 
> Regards, 
> Yohanan Ouaknine 
> Graduate student, Knowledge management, Bar Ilan University, Israel
> 
> 
> On Wed, Oct 5, 2011 at 8:24 PM, Andrew Schrock <aschrock at usc.edu> wrote:
> Has anybody successfully scraped a Google discussion group? I found a script online, but it's thrown off by the fact you now have to login to view any groups.
> 
> Google is getting squirrely about spammers scraping their data, so it may be a big roadblock. I'm looking at authorization with the Google PHP lib, but I'm not sure it will get me to groups, it all seems app-focused (so if you want to add items to a Google calendar for instance).
> 
> Much appreciate any ideas that don't involve me adding 6000-some message to my analysis software by hand :/
> 
> best
> Andrew
> 
> 
> 
> Andrew Schrock
> USC Annenberg Doctoral Student
> aschrock at usc.edu
> 714.330.6545
> 
> 
> 
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/
> 
> 
> 
> -- 
> יוחנן ועקנין
> Yohanan Ouaknine
> 
> 
> 050-6279777
> yohanan.ouaknine at ois.co.il
> http://il.linkedin.com/in/yohananouaknine
> 
> 
> See who we know in common	
> 
> 




Andrew Schrock
USC Annenberg Doctoral Student
aschrock at usc.edu
714.330.6545






More information about the Air-L mailing list