[Air-L] Ethics of using hacked data.

live human.factor.one at gmail.com
Wed Oct 7 13:47:34 PDT 2015


Depends on the country. Legal in Denmark in 2012. Stine Lomborg, in a research study, did used hacked data but decided to get informed consent from every data point. You could do that.

Sent from my mobile device,
Please excuse any typos

> On 8 Oct 2015, at 7:27 AM, Tim Laquintano <tlaquintano at gmail.com> wrote:
> 
> Hi Nat,
> 
> I think that's an interesting question, but as someone unfamiliar with hacking laws I need to ask: is it legal to download/own the data?
> 
> Best,
> 
> Tim 
> 
> Tim Laquintano 
> Assistant Professor of English
> Lafayette College 
> 
> Sent from my iPhone
> 
>> On Oct 7, 2015, at 4:11 PM, Nathaniel Poor <natpoor at gmail.com> wrote:
>> 
>> Hello list-
>> 
>> I recently got into a discussion with a colleague about the ethics of using
>> hacked data, specifically the Patreon hacked data (see here:
>> http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-patreon-donations-site-dumped-online/
>> ).
>> 
>> He and I do crowdfunding work, and had wanted to look at Patreon, but as
>> far as I can tell they have no easy hook into all their projects (for
>> scraping), so, to me this data hack was like a gift! But he said there was
>> no way we could use it. We aren't doing sentiment analysis or anything, we
>> would use aggregated measures like funding levels and then report things
>> like means and maybe a regression, so there would be no identifiable
>> information whatsoever derived from the hacked data in any of our resulting
>> work (we might go to the site and pull some quotes).
>> 
>> I looked at the AoIR ethics guidelines ( http://aoir.org/reports/ethics2.pdf
>> ), and didn't see anything specifically about hacked data (I don't think
>> "hacked" is the best word, but I don't like "stolen" either, but those are
>> different discussions).
>> 
>> One relevant line I noticed was this one:
>> "If access to an online context is publicly available, do
>> members/participants/authors
>> perceive the context to be public?" (p. 8)
>> So, the problem with the data is that it's the entire website, so some was
>> private and some was public, but now it's all public and everyone knows
>> it's public.
>> 
>> To me, I agree that a lot of the data in the data-dump had been intended to
>> be private -- apparently, direct messages are in there -- but we wouldn't
>> use that data (it's not something we're interested in). We'd use data like
>> number of funders and funding levels and then aggregate everything. I see
>> that some of it was meant to be private, but given the entire site was
>> hacked and exported I don't see how currently anyone could have an
>> expectation of privacy any more. I'm not trying to torture the definition,
>> it's just that it was private until it wasn't.
>> 
>> I can see that some academic researchers -- at least those in computer
>> security -- would be interested in this data and should be able to publish
>> in peer reviewed journals about it, in an anonymized manner (probably as an
>> example of "here's a data hack like what we are talking about, here's what
>> hackers released").
>> 
>> I also think that probably every script kiddie has downloaded the data, as
>> has every grey and black market email list spammer, and probably every
>> botnet purveyor (for passwords) and maybe even the hacking arm of the
>> Chinese army and the NSA. My point here is that if we were to use the data
>> in academic research we wouldn't be publicizing it to nefarious people who
>> would misuse it since all of those people already have it. We could maybe
>> help people who want to use crowdfunding some (hopefully!) if we have some
>> results. (I guess I don't see that we would be doing any harm by using it.)
>> 
>> 
>> So, what do people think? Did I miss something in the AoIR guidelines? I
>> realize I don't think it's clear either way, or I wouldn't be asking, so
>> probably the answers will point to this as a grey area (so why do I even
>> ask, I am not sure).
>> 
>> But I'm not looking for "You can't use it because it's hacked," because I
>> don't think that explains anything. I could counter that with "It is
>> publicly available found data," because it is, although I don't think
>> that's the best reply either. Both lack nuance.
>> 
>> -Nat
>> 
>> -- 
>> Nathaniel Poor, Ph.D.
>> http://natpoor.blogspot.com
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> 
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list