[Air-l] ethics - aol data

Jonathan Cornwell jrc at tcfir.org
Mon Aug 28 19:48:12 PDT 2006


I don't believe that there is a way to make Internet data truly private
especially when data is gathered at network distribution points like an AOL
or even a last-mile provider, or in any case where two or more datasets can
be correlated. At best, anonymity can only occur with small datasets (small
in number of users, "transactions" per user and time). The larger the
dataset, the more likely it becomes that one can reconstruct individual
identities. Simple cipher tricks like replacing names and locations with
codes are ridiculously easy to crack; again, the larger the sample size, the
easier it is to identify the pattern to the encryption or the patterns
within the encryption. On the other hand, truly effective ways of sanitizing
the data are also likely to also corrupt the patterns that we researchers
are interested in.

It's rather mind numbing to think about how rich of a story we leave out in
the world about our activities.

In most traditional forms of human-subject research, the individual is a
gatekeeper of sorts; by participating or not, by the various self-report
biases, etc. In other cases, such as the compilation of NAEP test data, the
nature of the data is rather innocuous and quite limited in scope. When
subjects are aware that data is being gathered about them in some way, it is
a public transaction and their behavior can be self-monitored and
self-regulated. It is the surreptitious data and the data-as-byproduct of
other activities that is most interesting but also the most... damaging.
This is the nature of the AOL data. People were conducting their Internet
lives under the illusion of privacy and, through their choices, showing a
face they might not show to the sociologist, the ethnographer, the
psychologist, etc. 600k+ people is a big sample.

Surely it hasn't escaped anyone in this group that the Internet represents a
most interesting paradox as being both the ultimate "brown bag" and most
public of public forums. This is one of the most fascinating features of the
Internet to me and one that ties me knots trying to understand.

There are Nobel Prizes to be found among the data of the Internet but its
ethically radioactive... at this point.

As always, these are my thoughts at this point in my understanding;
knowledge is provisional.

Jonathan Cornwell





More information about the Air-L mailing list