[Air-L] Reddit dataset

Elijah Wright elijah.wright at gmail.com
Sat Jul 4 17:39:50 PDT 2015


It's fun to think about -- and shows our biases, and how they change over time.

I would have argued a decade-plus ago that you should send all such
work dealing with online fora through human subjects for review
anyway, but given the upswing in the number of people who are
supposedly-competent users of the internet and understand that words,
online, are more permanent than we might like... I might argue,
nowadays, that we can be far too cautious and that burdening the local
office of human subjects with full-review on anything dealing with
public text is... a burden on them.  ;-)

I really like Alex's approach of always asking explicit permission
before republishing verbatim quotes.  Search engines, indeed, find
those things - and accidentally doxxing someone over controversial
words is not something to relish.

Most of the human subjects folk have surely figured out how to deal
with expedited review of internet related work, by now... I hope.

;-)

--e


On Sat, Jul 4, 2015 at 7:12 PM, Michael T Zimmer <zimmerm at uwm.edu> wrote:
> Yeah, I’d probably agree with such an approach (which is the more nuanced thinking I’d hope for, rather than just an automatic “public=always ok”).
>
> --
> Michael Zimmer, PhD
> Associate Professor, School of Information Studies
> Director, Center for Information Policy Research
> University of Wisconsin-Milwaukee
> e: zimmerm at uwm.edu
> w: www.michaelzimmer.org
>
>
>> On Jul 4, 2015, at 2:25 PM, Alex Halavais <alex at halavais.net> wrote:
>>
>> I think it's worth considering the context of the comments and taking
>> steps to reduce harm. I also think that when you do things in public
>> fora, you are subjecting yourself to the possibility of becoming
>> someone's research subject. While it's important to consider harm in
>> the process, it is also not essential to obtain consent of those who
>> posted.
>>
>> I've used Reddit data. The only time I requested consent was for
>> direct quotations, since it could be easily traced back to the
>> original commenters.  For aggregated analysis, I think that requesting
>> such permission is both onerous and unnecessary, and greater
>> collective damage is caused by the research that might be left undone
>> because of unnecessarily strict drawing of privacy lines.
>>
>> - Alex
>>
>>
>> On Sat, Jul 4, 2015 at 10:51 AM, Alex Leavitt <alexleavitt at gmail.com> wrote:
>>> Just to give more context, the reddit API cannot scrape 'private'
>>> subreddits. So yes, it is entirely public data. That said, I think there
>>> are some social data issues to consider, such as persistence and issues of
>>> access, but technically those issues exist through reddit's search
>>> functionality (and actually play an important role in accountability on the
>>> platform for individuals).
>>>
>>>
>>> ---
>>>
>>> Alexander Leavitt
>>> PhD Candidate
>>> USC Annenberg School for Communication & Journalism
>>> http://alexleavitt.com
>>> Twitter: @alexleavitt <http://twitter.com/alexleavitt>
>>>
>>>
>>> On Sat, Jul 4, 2015 at 5:31 AM, Michael T Zimmer <zimmerm at uwm.edu> wrote:
>>>
>>>> I saw that, but has it since been deleted by the OP? I can’t seem to find
>>>> it, nor the thread on Reddit.
>>>>
>>>> IIRC, a Redditor used the site’s API to grab all comments ever posted to
>>>> the site, but it wasn’t clear if this included subreddits that are set as
>>>> “private.”
>>>>
>>>> In the Facebook post, the OP appeared to cast aside concerns over consent
>>>> since the data was “public”. While I’d agree the data is public in the
>>>> sense that anyone could access it (if they had the URLs, search
>>>> capabilities, time, etc to do so), this calculus is much too simplistic and
>>>> ignores the contextual nature of those comments. As another commenter on
>>>> the FB thread (was that you, Katy? I can’t remember) noted, just because
>>>> someone posted a comment to Reddit doesn’t mean they’ve necessarily
>>>> consented to having that data included in a research study. Plus, as the
>>>> commenter noted, there likely are minors in the dataset, which complicates
>>>> consent.
>>>>
>>>> This relates closely to what I discuss in my article "'But the data is
>>>> already public': on the ethics of research in Facebook”, and others have
>>>> covered as well, especially in the AoIR Ethics Guidelines:
>>>> http://ethics.aoir.org/
>>>>
>>>> Michael
>>>>
>>>> --
>>>> Michael Zimmer, PhD
>>>> Associate Professor, School of Information Studies
>>>> Director, Center for Information Policy Research
>>>> University of Wisconsin-Milwaukee
>>>> e: zimmerm at uwm.edu
>>>> w: www.michaelzimmer.org
>>>>
>>>>
>>>>> On Jul 3, 2015, at 9:41 PM, Katy Pearce <katycarvt at gmail.com> wrote:
>>>>>
>>>>> Someone posted a link to a dataset of Reddit posts to the AOIR Facebook
>>>>> page.
>>>>> I wonder what the AOIR community members feel about this in terms of this
>>>>> being "public" data.
>>>>> _______________________________________________
>>>>> The Air-L at listserv.aoir.org mailing list
>>>>> is provided by the Association of Internet Researchers http://aoir.org
>>>>> Subscribe, change options or unsubscribe at:
>>>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>>>>
>>>>> Join the Association of Internet Researchers:
>>>>> http://www.aoir.org/
>>>>
>>>> _______________________________________________
>>>> The Air-L at listserv.aoir.org mailing list
>>>> is provided by the Association of Internet Researchers http://aoir.org
>>>> Subscribe, change options or unsubscribe at:
>>>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>>>
>>>> Join the Association of Internet Researchers:
>>>> http://www.aoir.org/
>>>>
>>> _______________________________________________
>>> The Air-L at listserv.aoir.org mailing list
>>> is provided by the Association of Internet Researchers http://aoir.org
>>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>>
>>> Join the Association of Internet Researchers:
>>> http://www.aoir.org/
>>
>>
>>
>> --
>>
>> // Alexander Halavais, Sociologist, Semiologist, and Saboteur Extraordinaire
>> // Associate Professor of Social Technologies, Arizona State University
>> // http://alex.halavais.net/bio     @halavais
>>
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list