[Air-L] Data Collection in Reddit

Tim Squirrell timsquirrell at gmail.com
Tue May 15 06:18:44 PDT 2018


Hi Xanat et al,

I’ve also done some work in this area, in part using BigQuery but also some other tools. 

https://www.google.co.uk/amp/s/qz.com/1056319/what-is-the-alt-right-a-linguistic-data-analysis-of-3-billion-reddit-comments-shows-a-disparate-group-that-is-quickly-uniting/amp/

https://www.google.co.uk/amp/s/qz.com/1083444/analysis-of-500-million-reddit-comments-shows-how-the-alt-right-made-the-alt-left-a-thing/amp/

https://www.google.co.uk/amp/s/qz.com/1092037/the-alt-right-is-creating-its-own-dialect-heres-a-complete-guide/amp/

You might also want to look at:

https://www.google.co.uk/amp/s/fivethirtyeight.com/features/dissecting-trumps-most-rabid-online-following/amp/

And some recent slides from a talk I did which have links to a bunch of useful tools in them: https://docs.google.com/presentation/d/1TffY4BCt0CHxifq6_0DMCTvNLZ76CPgCmzgN7uQiDAY

All the best,

TJM

Sent from my iPhone

> On 15 May 2018, at 13:32, kalev leetaru <kalev.leetaru5 at gmail.com> wrote:
> 
> Xanat, in case it is of interest, Google has also regularly loaded the
> Reddit dataset into BigQuery (you get some amount of free quota per month
> to use the BQ service):
> 
> https://www.reddit.com/r/bigquery/comments/5z957b/more_than_3_billion_reddit_comments_loaded_on/
> 
> They've done some neat example analyses with it:
> 
> https://medium.com/@hoffa/reddit-favorite-sources-the-most-linked-sites-expanded-and-interactive-79070d648573
> https://medium.com/@hoffa/which-subreddits-have-the-most-energy-how-upvotes-translate-into-pageviews-4e6a1bf2af7e
> https://medium.freecodecamp.com/reddit-uptime-2008-2016-bigquery-b3d7b11046e0
> https://medium.com/google-cloud/reddit-s-presidential-race-candidate-mentions-in-comment-1f9fd6a7985a
> https://medium.com/google-cloud/a-short-story-of-the-comments-on-reddit-from-2007-until-today-29545916aced
> 
> Also, a more complex example combining it with TensorFlow, Cloud Dataflow
> and my GDELT data:
> 
> https://cloud.google.com/blog/big-data/2018/03/predicting-community-engagement-on-reddit-using-tensorflow-gdelt-and-cloud-dataflow-part-1
> https://cloud.google.com/blog/big-data/2018/03/predicting-community-engagement-on-reddit-using-tensorflow-gdelt-and-cloud-dataflow-part-2
> https://cloud.google.com/blog/big-data/2018/03/predicting-community-engagement-on-reddit-using-tensorflow-gdelt-and-cloud-dataflow-part-3
> 
> Even if you don't use BigQuery at your institution, some of the examples
> above might give you ideas on some of the at-scale analyses that can be
> done and combining it with TF, etc.
> 
> K
> 
> 
> 
>> On Tue, May 15, 2018 at 7:59 AM, Deen Freelon <dfreelon at gmail.com> wrote:
>> 
>> Hi Xanat,
>> 
>> You're in luck--all Reddit posts and comments from the beginning are
>> available to download here: http://files.pushshift.io/reddit/ . However,
>> they're in JSON format so you'll need to learn how to parse that before you
>> can access them. Best, /DEEN
>> 
>> 
>>>> On 5/15/2018 12:54 AM, Xanat Meza wrote:
>>> 
>>> Hello members! In our department, we are planning a study about the
>>> expression of phobias in social networks. I would like to know any software
>>> recommendations to download: a) Reddit posts and b) Information about
>>> Reddit users. If anyone has literature related to Reddit and psychology, we
>>> would be grateful too.
>>> Regards,
>>> Xanat V. Meza
>>> 
>>> Ph.D. candidate - Kansei, Behavioral and Brain SciencesUniversity of
>>> Tsukuba
>>> M.A. Media and Communication
>>> Yeungnam University
>>> B.D. Graphic Communication Design
>>> Universidad Autonoma Metropolitana
>>> 
>>> _______________________________________________
>>> The Air-L at listserv.aoir.org mailing list
>>> is provided by the Association of Internet Researchers http://aoir.org
>>> Subscribe, change options or unsubscribe at:
>>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>> 
>>> Join the Association of Internet Researchers:
>>> http://www.aoir.org/
>> 
>> --
>> Deen Freelon, Ph.D.
>> Associate Professor
>> School of Media and Journalism, UNC-Chapel Hill
>> http://dfreelon.org | @dfreelon <https://twitter.com/dfreelon> |
>> https://github.com/dfreelon
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at:
>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> 
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list