[Air-L] Data Collection in Reddit

kalev leetaru kalev.leetaru5 at gmail.com
Tue May 15 05:32:08 PDT 2018


Xanat, in case it is of interest, Google has also regularly loaded the
Reddit dataset into BigQuery (you get some amount of free quota per month
to use the BQ service):

https://www.reddit.com/r/bigquery/comments/5z957b/more_than_3_billion_reddit_comments_loaded_on/

They've done some neat example analyses with it:

https://medium.com/@hoffa/reddit-favorite-sources-the-most-linked-sites-expanded-and-interactive-79070d648573
https://medium.com/@hoffa/which-subreddits-have-the-most-energy-how-upvotes-translate-into-pageviews-4e6a1bf2af7e
https://medium.freecodecamp.com/reddit-uptime-2008-2016-bigquery-b3d7b11046e0
https://medium.com/google-cloud/reddit-s-presidential-race-candidate-mentions-in-comment-1f9fd6a7985a
https://medium.com/google-cloud/a-short-story-of-the-comments-on-reddit-from-2007-until-today-29545916aced

Also, a more complex example combining it with TensorFlow, Cloud Dataflow
and my GDELT data:

https://cloud.google.com/blog/big-data/2018/03/predicting-community-engagement-on-reddit-using-tensorflow-gdelt-and-cloud-dataflow-part-1
https://cloud.google.com/blog/big-data/2018/03/predicting-community-engagement-on-reddit-using-tensorflow-gdelt-and-cloud-dataflow-part-2
https://cloud.google.com/blog/big-data/2018/03/predicting-community-engagement-on-reddit-using-tensorflow-gdelt-and-cloud-dataflow-part-3

Even if you don't use BigQuery at your institution, some of the examples
above might give you ideas on some of the at-scale analyses that can be
done and combining it with TF, etc.

K



On Tue, May 15, 2018 at 7:59 AM, Deen Freelon <dfreelon at gmail.com> wrote:

> Hi Xanat,
>
> You're in luck--all Reddit posts and comments from the beginning are
> available to download here: http://files.pushshift.io/reddit/ . However,
> they're in JSON format so you'll need to learn how to parse that before you
> can access them. Best, /DEEN
>
>
> On 5/15/2018 12:54 AM, Xanat Meza wrote:
>
>> Hello members! In our department, we are planning a study about the
>> expression of phobias in social networks. I would like to know any software
>> recommendations to download: a) Reddit posts and b) Information about
>> Reddit users. If anyone has literature related to Reddit and psychology, we
>> would be grateful too.
>> Regards,
>> Xanat V. Meza
>>
>> Ph.D. candidate - Kansei, Behavioral and Brain SciencesUniversity of
>> Tsukuba
>> M.A. Media and Communication
>> Yeungnam University
>> B.D. Graphic Communication Design
>> Universidad Autonoma Metropolitana
>>
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at:
>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>>
>
> --
> Deen Freelon, Ph.D.
> Associate Professor
> School of Media and Journalism, UNC-Chapel Hill
> http://dfreelon.org | @dfreelon <https://twitter.com/dfreelon> |
> https://github.com/dfreelon
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



More information about the Air-L mailing list