[Air-L] Ethics of using hacked data.

Wed Oct 7 15:26:49 PDT 2015

I was just having this discussion with a student after a webinar on Ashley
Madison that we hosted for students here at the Berkeley I School - thought
I'd share some of what I shared with the student in case folks find it
useful.

The short response is: it's difficult to give an all-or-nothing sort of
answer to the question of using stolen data, as there are a number of
considerations we need to fold in. (I won't speak directly to Patreon, but
a lot of the context below applies, I think.) In particular, it's important
to keep in mind means and ends in a professional context. To illustrate, we
can consider two main groups that have an interest in the contents of the
AM data dump: academic researchers and journalists.

One of the primary "ends" of journalism, at least ideally, is service to
the public interest. With that in mind, we permit or tolerate - from a
professional ethical perspective, at least - journalists to mine and
explore stolen information so long as it is for reasons that can be
justified in terms of public interest. There are lots of precedents here,
from Watergate to Snowden to Wikileaks cables and so on.

A good example in this particular case is the Gizmodo reporting on Ashley
Madison <http://gizmodo.com/the-fembots-of-ashley-madison-1726670394>:
exposing the "fembots" (or fake, automated profiles) of Ashley Madison lays
bare a deceptive practice that consumers and the FTC have an legitimate
interest in knowing about. In another example, there are cases of
individuals reporting on public figures that might be included in the
database - in the aftermath of the hack, some outlets have reported on the
private emails of the CEO of AM parent company Avid Life Media (whose
relevance to the hack is obvious) while others broke the news that
conservative "family values" cultural figure Josh Duggar was using the
service. I think the case for investigating the CEO here (by, for example,
exposing his private emails) is ethically justifiable (though I think the
case for exposing Duggar is not as straightforward, but it's a legitimate
open question - by way of a counterargument, Dan Savage, in particular, really,
really thinks that exposing Duggar is justifiable
<http://www.bioethics.net/2015/09/ashley-madison-using-stolen-data/>.) At
the same time, we would not tolerate journalists targeting and exposing the
details of non-public figures in the dataset. That would be bullying or
harassment - and definitely unethical.

For academic researchers, our ends aren't *necessarily* public interest
(though there can be clear connections to the public interest in some
cases, like the West Virginia researchers that exposed VW's cheating
software
<http://spectrum.ieee.org/cars-that-think/transportation/advanced-cars/how-professors-caught-vw-cheating>).
Setting aside romantic notions of progress and "the glory of science"), the
ends of research can be variously ontological, epistemological, political,
etc...

Over time, we've decided - in part as a response to past ethical
transgressions  - that, regardless of the ends of our research, there are
certain values we shouldn't compromise in the pursuit of knowledge - chief
among them is the value of respect. As the Belmont Report and other
research documents, the notion of informed consent is one of the main ways
(if not *the* main way) in which we operationalize the value of respect in
practice. The challenge that the Ashley Madison data poses for researchers,
then, is that those included in the dataset never consented to being a part
of research (and, indeed, it could probably be assumed that many of the
affected individuals would definitely not agree to disclose many of these
intimate details to researchers without certain guarantees of privacy).

So, I'm not sure what the legal status of conducting research on a stolen
database might be (I don't have the legal background to answer that
question) - but from an ethical perspective, concerns with consent and
respect are still absolutely pressing. So, rather than giving a blanket
"yes" or "no" from an ethical perspective, I think it is important to
consider 1) the kinds of research questions you would want to ask and why,
2) what the relationship of your research might be to your institution's
IRB (if you're at the kind of institution that has one, anyway) given that
the dataset contains human subjects, and 3) what possible further harms
your research might cause if not approached properly. (Plus: even if
consent is out of the picture, we still have other important values to
which we can appeal - such as beneficence, justice, care, etc...).

-Anna

-----
Anna Lauren Hoffmann
Professional Faculty & Postdoctoral Researcher
School of Information
University of California, Berkeley

On Wed, Oct 7, 2015 at 2:35 PM, Jeremiah Spence <jeremiah.spence at gmail.com>
wrote:

> I was following the 2600 forums after the Ashley Madison hack.  People were
> wanting to explore that data trove and the consensus after some input from
> a journalist was that it was legal to obtain the data trove once it was
> released to the public.  But it is illegal to disclose sensitive data such
> as passwords or credit card numbers.
>
> This does not answer the colleague's original question regarding the use of
> the data in a formal research setting.  Perhaps the "grey hat" academic
> solution would be to anonymize the resulting data, in much the similar way
> we usually treat survey data.  That was analysis can be performed and no
> individual is "injured" as a result of the research.
>
> Jeremiah, Ph.D.
>
> On Wed, Oct 7, 2015 at 4:29 PM, Alex Leavitt <alexleavitt at gmail.com>
> wrote:
>
> > A similar case study might be the history of the ENRON email data set. It
> > went through multiple iterations of availability and takedowns as it was
> > slowly edited over time to remove emails. People still use it as a
> > canonical dataset, but it is certainly still controversial, and
> especially
> > was when it was first made available.
> >
> >
> > ---
> >
> > Alexander Leavitt
> > PhD Candidate
> > USC Annenberg School for Communication & Journalism
> > http://alexleavitt.com
> > Twitter: @alexleavitt <http://twitter.com/alexleavitt>
> >
> >
> > On Wed, Oct 7, 2015 at 1:54 PM, Peter Timusk <peterotimusk at gmail.com>
> > wrote:
> >
> > > I think one could look a little at the consequences of what you are
> > doing.
> > > Seems you are trying to make money by researching funding data is that
> > > right? I find that unethical but I find all kinds of data mining
> > unethical.
> > > There are reasons to use your same skill sets that could benefit
> society.
> > > May be I don't understand what your end result is about.
> > >
> > > Peter Timusk
> > > peterotimusk at gmail.com
> > > I do not speak for my employer or charities or political parties or
> > unions
> > > I volunteer with or belong to, unless otherwise noted.
> > >
> > >
> > > > On Oct 7, 2015, at 4:11 PM, Nathaniel Poor <natpoor at gmail.com>
> wrote:
> > > >
> > > > Hello list-
> > > >
> > > > I recently got into a discussion with a colleague about the ethics of
> > > using
> > > > hacked data, specifically the Patreon hacked data (see here:
> > > >
> > >
> >
> http://arstechnica.com/security/2015/10/gigabytes-of-user-data-from-hack-of-patreon-donations-site-dumped-online/
> > > > ).
> > > >
> > > > He and I do crowdfunding work, and had wanted to look at Patreon, but
> > as
> > > > far as I can tell they have no easy hook into all their projects (for
> > > > scraping), so, to me this data hack was like a gift! But he said
> there
> > > was
> > > > no way we could use it. We aren't doing sentiment analysis or
> anything,
> > > we
> > > > would use aggregated measures like funding levels and then report
> > things
> > > > like means and maybe a regression, so there would be no identifiable
> > > > information whatsoever derived from the hacked data in any of our
> > > resulting
> > > > work (we might go to the site and pull some quotes).
> > > >
> > > > I looked at the AoIR ethics guidelines (
> > > http://aoir.org/reports/ethics2.pdf
> > > > ), and didn't see anything specifically about hacked data (I don't
> > think
> > > > "hacked" is the best word, but I don't like "stolen" either, but
> those
> > > are
> > > > different discussions).
> > > >
> > > > One relevant line I noticed was this one:
> > > > "If access to an online context is publicly available, do
> > > > members/participants/authors
> > > > perceive the context to be public?" (p. 8)
> > > > So, the problem with the data is that it's the entire website, so
> some
> > > was
> > > > private and some was public, but now it's all public and everyone
> knows
> > > > it's public.
> > > >
> > > > To me, I agree that a lot of the data in the data-dump had been
> > intended
> > > to
> > > > be private -- apparently, direct messages are in there -- but we
> > wouldn't
> > > > use that data (it's not something we're interested in). We'd use data
> > > like
> > > > number of funders and funding levels and then aggregate everything. I
> > see
> > > > that some of it was meant to be private, but given the entire site
> was
> > > > hacked and exported I don't see how currently anyone could have an
> > > > expectation of privacy any more. I'm not trying to torture the
> > > definition,
> > > > it's just that it was private until it wasn't.
> > > >
> > > > I can see that some academic researchers -- at least those in
> computer
> > > > security -- would be interested in this data and should be able to
> > > publish
> > > > in peer reviewed journals about it, in an anonymized manner (probably
> > as
> > > an
> > > > example of "here's a data hack like what we are talking about, here's
> > > what
> > > > hackers released").
> > > >
> > > > I also think that probably every script kiddie has downloaded the
> data,
> > > as
> > > > has every grey and black market email list spammer, and probably
> every
> > > > botnet purveyor (for passwords) and maybe even the hacking arm of the
> > > > Chinese army and the NSA. My point here is that if we were to use the
> > > data
> > > > in academic research we wouldn't be publicizing it to nefarious
> people
> > > who
> > > > would misuse it since all of those people already have it. We could
> > maybe
> > > > help people who want to use crowdfunding some (hopefully!) if we have
> > > some
> > > > results. (I guess I don't see that we would be doing any harm by
> using
> > > it.)
> > > >
> > > >
> > > > So, what do people think? Did I miss something in the AoIR
> guidelines?
> > I
> > > > realize I don't think it's clear either way, or I wouldn't be asking,
> > so
> > > > probably the answers will point to this as a grey area (so why do I
> > even
> > > > ask, I am not sure).
> > > >
> > > > But I'm not looking for "You can't use it because it's hacked,"
> > because I
> > > > don't think that explains anything. I could counter that with "It is
> > > > publicly available found data," because it is, although I don't think
> > > > that's the best reply either. Both lack nuance.
> > > >
> > > > -Nat
> > > >
> > > > --
> > > > Nathaniel Poor, Ph.D.
> > > > http://natpoor.blogspot.com
> > > > _______________________________________________
> > > > The Air-L at listserv.aoir.org mailing list
> > > > is provided by the Association of Internet Researchers
> http://aoir.org
> > > > Subscribe, change options or unsubscribe at:
> > > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > > >
> > > > Join the Association of Internet Researchers:
> > > > http://www.aoir.org/
> > > _______________________________________________
> > > The Air-L at listserv.aoir.org mailing list
> > > is provided by the Association of Internet Researchers http://aoir.org
> > > Subscribe, change options or unsubscribe at:
> > > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >
> > > Join the Association of Internet Researchers:
> > > http://www.aoir.org/
> > >
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> >
>
>
>
> --
> --------------------
> Jeremiah Spence, Ph.D.
> Technologist. Analyst. Consultant.
> jeremiahspence.com
> jeremiah.spence at gmail.com
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>