[Air-L] scraping google discussion groups?
Andrew Schrock
aschrock at usc.edu
Thu Oct 6 09:13:31 PDT 2011
Thanks Yohanan, Shawn, Jeremy and Deen for your helpful suggestions. I had been thinking too much about custom coding using their API and not enough about using existing scraping software.
best
Andrew
On Oct 5, 2011, at 11:40 AM, יוחנן ועקנין wrote:
> Hello Andrew.
> I use Web Content Extractor from newprosoft.com in my research and it works quite good.
> Regards,
> Yohanan Ouaknine
> Graduate student, Knowledge management, Bar Ilan University, Israel
>
>
> On Wed, Oct 5, 2011 at 8:24 PM, Andrew Schrock <aschrock at usc.edu> wrote:
> Has anybody successfully scraped a Google discussion group? I found a script online, but it's thrown off by the fact you now have to login to view any groups.
>
> Google is getting squirrely about spammers scraping their data, so it may be a big roadblock. I'm looking at authorization with the Google PHP lib, but I'm not sure it will get me to groups, it all seems app-focused (so if you want to add items to a Google calendar for instance).
>
> Much appreciate any ideas that don't involve me adding 6000-some message to my analysis software by hand :/
>
> best
> Andrew
>
>
>
> Andrew Schrock
> USC Annenberg Doctoral Student
> aschrock at usc.edu
> 714.330.6545
>
>
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>
>
>
> --
> יוחנן ועקנין
> Yohanan Ouaknine
>
>
> 050-6279777
> yohanan.ouaknine at ois.co.il
> http://il.linkedin.com/in/yohananouaknine
>
>
> See who we know in common
>
>
Andrew Schrock
USC Annenberg Doctoral Student
aschrock at usc.edu
714.330.6545
More information about the Air-L
mailing list