[Air-L] Reddit dataset

Peter Timusk peterotimusk at gmail.com
Sun Jul 5 01:03:06 PDT 2015


I think in large survey work consent to be researched is needed. Quality and trust increase with knowing consent and assured privacy. I wonder if there are any pollsters in the list who would comment. This is how we argue at my day job which does not collect internet data and subjects are often legally required to participate. I am adding this perspective here to show another end of the spectrum.

I would be interested in comparing this debate to collection standards in medical research like drug trials or addictions treatment outcome research as I know that area a bit.

I also realize findings may be less interesting on a subject unit basis from a telephone interview or paper questionnaire compared to what one can find online.

Peter Timusk
peterotimusk at gmail.com
I do not speak for my employer or charities or political parties or unions I volunteer with or belong to, unless otherwise noted.


> On Jul 4, 2015, at 3:25 PM, Alex Halavais <alex at halavais.net> wrote:
> 
> I think it's worth considering the context of the comments and taking
> steps to reduce harm. I also think that when you do things in public
> fora, you are subjecting yourself to the possibility of becoming
> someone's research subject. While it's important to consider harm in
> the process, it is also not essential to obtain consent of those who
> posted.
> 
> I've used Reddit data. The only time I requested consent was for
> direct quotations, since it could be easily traced back to the
> original commenters.  For aggregated analysis, I think that requesting
> such permission is both onerous and unnecessary, and greater
> collective damage is caused by the research that might be left undone
> because of unnecessarily strict drawing of privacy lines.
> 
> - Alex
> 
> 
>> On Sat, Jul 4, 2015 at 10:51 AM, Alex Leavitt <alexleavitt at gmail.com> wrote:
>> Just to give more context, the reddit API cannot scrape 'private'
>> subreddits. So yes, it is entirely public data. That said, I think there
>> are some social data issues to consider, such as persistence and issues of
>> access, but technically those issues exist through reddit's search
>> functionality (and actually play an important role in accountability on the
>> platform for individuals).
>> 
>> 
>> ---
>> 
>> Alexander Leavitt
>> PhD Candidate
>> USC Annenberg School for Communication & Journalism
>> http://alexleavitt.com
>> Twitter: @alexleavitt <http://twitter.com/alexleavitt>
>> 
>> 
>>> On Sat, Jul 4, 2015 at 5:31 AM, Michael T Zimmer <zimmerm at uwm.edu> wrote:
>>> 
>>> I saw that, but has it since been deleted by the OP? I can’t seem to find
>>> it, nor the thread on Reddit.
>>> 
>>> IIRC, a Redditor used the site’s API to grab all comments ever posted to
>>> the site, but it wasn’t clear if this included subreddits that are set as
>>> “private.”
>>> 
>>> In the Facebook post, the OP appeared to cast aside concerns over consent
>>> since the data was “public”. While I’d agree the data is public in the
>>> sense that anyone could access it (if they had the URLs, search
>>> capabilities, time, etc to do so), this calculus is much too simplistic and
>>> ignores the contextual nature of those comments. As another commenter on
>>> the FB thread (was that you, Katy? I can’t remember) noted, just because
>>> someone posted a comment to Reddit doesn’t mean they’ve necessarily
>>> consented to having that data included in a research study. Plus, as the
>>> commenter noted, there likely are minors in the dataset, which complicates
>>> consent.
>>> 
>>> This relates closely to what I discuss in my article "'But the data is
>>> already public': on the ethics of research in Facebook”, and others have
>>> covered as well, especially in the AoIR Ethics Guidelines:
>>> http://ethics.aoir.org/
>>> 
>>> Michael
>>> 
>>> --
>>> Michael Zimmer, PhD
>>> Associate Professor, School of Information Studies
>>> Director, Center for Information Policy Research
>>> University of Wisconsin-Milwaukee
>>> e: zimmerm at uwm.edu
>>> w: www.michaelzimmer.org
>>> 
>>> 
>>>> On Jul 3, 2015, at 9:41 PM, Katy Pearce <katycarvt at gmail.com> wrote:
>>>> 
>>>> Someone posted a link to a dataset of Reddit posts to the AOIR Facebook
>>>> page.
>>>> I wonder what the AOIR community members feel about this in terms of this
>>>> being "public" data.
>>>> _______________________________________________
>>>> The Air-L at listserv.aoir.org mailing list
>>>> is provided by the Association of Internet Researchers http://aoir.org
>>>> Subscribe, change options or unsubscribe at:
>>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>>> 
>>>> Join the Association of Internet Researchers:
>>>> http://www.aoir.org/
>>> 
>>> _______________________________________________
>>> The Air-L at listserv.aoir.org mailing list
>>> is provided by the Association of Internet Researchers http://aoir.org
>>> Subscribe, change options or unsubscribe at:
>>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>> 
>>> Join the Association of Internet Researchers:
>>> http://www.aoir.org/
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> 
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
> 
> 
> 
> -- 
> 
> // Alexander Halavais, Sociologist, Semiologist, and Saboteur Extraordinaire
> // Associate Professor of Social Technologies, Arizona State University
> // http://alex.halavais.net/bio     @halavais
> 
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list