[Air-L] Fwd: Facebook data destruction
sguerses at esat.kuleuven.be
Thu Mar 25 16:48:11 PDT 2010
bertil wrote: Couldn't a for-statistical-purpose-only access have been
a possible option?
this is a complicated and interesting question. technically, there are
two approaches that deal with the question of publishing/accessing
large data sets for statistical analysis while avoiding re-
identification of individuals: the first one is called privacy
preserving data publishing(ppdp) the second differential privacy (the
new kid on the block). ppdp has proven that it is not possible to
publish anonymized datasets and provide sufficient guarantees with
respect to re-identification. differential privacy on the other hand
is query based: no dataset is published, instead researchers/analysts
can pose queries up to a certain point, and the system guarantees that
given the queries, no analysis can lead to re-identification of
individuals. microsoft has hired most of the prominent researchers
working on differential privacy, while facebook i think won over lars
backstrom, who is also from the differential privacy gang (gang
membership being based on co-authorship).
now, as typically it is the case in most privacy research, the concern
is with personal re-identification. the sort of categorization and
potential social sorting is not the concern of these algorithmic
approaches. so profiling is ok, as long as the analyst cannot identify.
that is, as many surveillance studies authors discuss, a myth. at
least in cases where social sorting is not based on individual
identification but on matching behavior or attributes, e.g., sort
customers based on knowledge of past shopping behavior, or based on
their connectedness in a social network.
one of the ways of dealing with categorization and social sorting is
through transparency and engagement. but that brings up a lot of
accountability issues for which we do not have models of practice. but
i think the discussions on this list are fruitful for thinking about
the problem and conceiving new practices and learning from existing
ones e.g., the aol query dataset, the report on ethnicity in facebook.
Date: Thu, 25 Mar 2010 21:34:57 +0100
From: Bertil Hatt <bertil.hatt at ensae.org>
To: jkd <jkd at email.unc.edu>, Christophe Prieur
<christophe.prieur at liafa.jussieu.fr>
Cc: AoIR-L <air-l at listserv.aoir.org>
Subject: Re: [Air-L] Fwd: Facebook data destruction
<111a48f01003251334h4ad6d42fsb27b385ea26f4f89 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Couldn't a for-statistical-purpose-only access have been a possible
I'm not familiar with handling such massive data, and I assume that
only have made sense through a paying service, but? allowing scripts
some aggregated data (say, preventing any results that didn't involve
10,000+ accounts) would have respected most privacy concerns, no?
Having those data anywhere, hackable, is a legal risk and that enough
justifies Facebook's threats but I'm still hoping for an academic
this amazing database.
More information about the Air-L