[Air-L] Fwd: Facebook data destruction

Thu Mar 25 16:48:11 PDT 2010

bertil wrote: Couldn't a for-statistical-purpose-only access have been  
a possible option?

this is a complicated and interesting question. technically, there are  
two approaches that deal with the question of publishing/accessing  
large data sets for statistical analysis while avoiding re- 
identification of individuals: the first one is called privacy  
preserving data publishing(ppdp) the second differential privacy (the  
new kid on the block). ppdp has proven that it is not possible to  
publish anonymized datasets and provide sufficient guarantees with  
respect to re-identification. differential privacy on the other hand  
is query based: no dataset is published, instead researchers/analysts  
can pose queries up to a certain point, and the system guarantees that  
given the queries, no analysis can lead to re-identification of  
individuals. microsoft has hired most of the prominent researchers  
working on differential privacy, while facebook i think won over lars  
backstrom, who is also from the differential privacy gang (gang  
membership being based on co-authorship).

now, as typically it is the case in most privacy research, the concern  
is with personal re-identification. the sort of categorization and  
potential social sorting is not the concern of these algorithmic  
approaches. so profiling is ok, as long as the analyst cannot identify.

that is, as many surveillance studies authors discuss, a myth. at  
least in cases where social sorting is not based on individual  
identification but on matching behavior or attributes,  e.g., sort  
customers based on knowledge of past shopping behavior, or based on  
their connectedness in a social network.

one of the ways of dealing with categorization and social sorting is  
through transparency and engagement. but that brings up a lot of  
accountability issues for which we do not have models of practice. but  
i think the discussions on this list are fruitful for thinking about  
the problem and conceiving new practices and learning from existing  
ones e.g., the aol query dataset, the report on ethnicity in facebook.
cheers,
s.

Message: 2
Date: Thu, 25 Mar 2010 21:34:57 +0100
From: Bertil Hatt <bertil.hatt at ensae.org>
To: jkd <jkd at email.unc.edu>, 	Christophe Prieur
	<christophe.prieur at liafa.jussieu.fr>
Cc: AoIR-L <air-l at listserv.aoir.org>
Subject: Re: [Air-L] Fwd: Facebook data destruction
Message-ID:
	<111a48f01003251334h4ad6d42fsb27b385ea26f4f89 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Couldn't a for-statistical-purpose-only access have been a possible  
option?

  I'm not familiar with handling such massive data, and I assume that  
could
only have made sense through a paying service, but? allowing scripts  
to pull
some aggregated data (say, preventing any results that didn't involve
10,000+ accounts) would have respected most privacy concerns, no?
  Having those data anywhere, hackable, is a legal risk and that enough
justifies Facebook's threats but I'm still hoping for an academic  
access of
this amazing database.

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm