[Air-L] Anonymizing Twitter handles

Shulman, Stu stu at texifter.com
Fri Apr 14 04:37:43 PDT 2017


There are many good reasons to anonymize Tweets during the research process
(reducing annotator bias, for example) and definitely during the
presentation of results (particularly controversial Tweets). Indeed, the
visual presentation of sensational individual Tweets is something ethicists
and IRBs might caution against, despite the public nature of the platform.
Going further, you have to consider the ethical obligation not to
publically display deleted Tweets, though I don't think this would extend
to public figures, like @realdonaldtrump.

Having said that, Tweets have considerably less "meaning" when you hide the
Twitter handles. Context is lost, so there is a big trade off. DiscoverText
has an automated redaction capability that can remove or obscure all the
Twitter handles at once. Here is an example of an archive consisting of
replies to a Tweet status ID where the start of every Tweet is a Twitter
handle:

https://drive.google.com/file/d/0B1iEonkdfwKua0lmWndNZTkyWXM/view?usp=sharing

This (underutilized) functionality is a part of a Freedom of Information
Act (FOIA) capability including a "dirty word tool" that members of this
list helped to create about 5 years ago. If any member of this list would
like to experiment with the redaction tools, just shoot us an email (
info at texifter.com) and I will put you in a special sponsored sandbox for
redaction experiments, I will give you a web demo, and we will provide
complimentary Gnip and Search API access to play with.

~Stu



On Fri, Apr 14, 2017 at 7:07 AM, Bernhard Rieder <berno.rieder at gmail.com>
wrote:

> > On 14 Apr 2017, at 7:47 , Maurice Vergeer <m.vergeer at maw.ru.nl> wrote:
>
> > Still, anonymizing is fairly easy when you have the data in a statistical
> > program such as SPSS, R or even Excel: replace the userhandles with a
> > unique number (from 1 to N).
> > Then remove the userhandles from the dataset. Still I would advice always
> > to keep a secure file with both keyvariables userhandles and the new
> > identifyer for future resrearch.
>
> I you hash the userhandle, e.g. with SHA-1 or similar (which is even
> possible in Excel with a small formula), there is no need to keep a
> correspondence file, because hashing a string will always yield the same
> hash - while making reversal virtually impossible (i.e. you cannot get the
> handle from the hash).
>
> best,
> Bernhard
>
> --
> Bernhard Rieder | Associate Professor | New Media and Digital Culture
> University of Amsterdam | Turfdraagsterpad 9 | 1012 XT  Amsterdam | The
> Netherlands
> http://thepoliticsofsystems.net | http://rieder.polsys.net |
> https://www.digitalmethods.net | @RiederB
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/
> listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



-- 
Dr. Stuart W. Shulman
Founder and CEO, Texifter
LinkedIn: http://www.linkedin.com/in/stuartwshulman



More information about the Air-L mailing list