[Air-L] scraping google discussion groups?

Deen Freelon dfreelon at gmail.com
Wed Oct 5 11:28:02 PDT 2011


One option would be to save the pages either manually or using a crawler 
and then scrape the data out locally. It's more annoying than going 
directly from online to database, but it's still better than the 
alternative. ~DEEN

On 10/5/11 2:24 PM, Andrew Schrock wrote:
> Has anybody successfully scraped a Google discussion group? I found a script online, but it's thrown off by the fact you now have to login to view any groups.
>
> Google is getting squirrely about spammers scraping their data, so it may be a big roadblock. I'm looking at authorization with the Google PHP lib, but I'm not sure it will get me to groups, it all seems app-focused (so if you want to add items to a Google calendar for instance).
>
> Much appreciate any ideas that don't involve me adding 6000-some message to my analysis software by hand :/
>
> best
> Andrew
>
>
>
> Andrew Schrock
> USC Annenberg Doctoral Student
> aschrock at usc.edu
> 714.330.6545
>
>
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/


-- 
Deen Freelon
Acting Assistant Professor
American University School of Communication
dfreelon at gmail.com
http://dfreelon.org/





More information about the Air-L mailing list