[Air-L] Email Analysis Software

Derek Hansen shakmatt at gmail.com
Mon Jun 9 09:55:18 PDT 2008

Like many of you, I frequently analyze relatively large collections of
email messages (e.g., up to 100,000 messages sent to an email list).
To do this I have used a hodge-podge of different programs, none of
which I am completely happy with. I have used both quantitative and
qualitative approaches and would be interested in hearing from you
about software that works well for either approach. The only
requirement is that it works well with lots of messages. Below I have
listed some of the basic things I've done with various tools. I'd love
to see you all add to the list. Thanks ahead of time.

Tool: Mailbag Assistant (PC only, $40 - see
  - View messages from large corpus (90,000 messages) quickly (much
quicker than traditional email clients)
  - Create complex searches (using regular expressions) that can be
saved and re-run on different subsets of data
  - Export messages (or just message headers) into a database format
(e.g., export header information into MS Access Database)
  - Run some basic built-in queries (e.g., # of messages per month,
contributors, most frequently used words)

Unfortunately, the version I used (as of a year ago) did not allow you
to tag individual messages or pull out a random sample of messages

Tool: Custom built programs that help multiple coders tag messages
that are shown in a web browser. Unfortunately, I could not find an
existing program that worked and have used 2 different custom programs
for the same purpose (each with slightly different functionality). The
tools that were developed are not really meant to be easily used for
other rating scenarios :(, so I am interested in finding a more
general purpose rating support tool. The most important functionality
  - Displays a message (randomly selected from a corpus of messages
that all get rated) through a web browser, along with a set of
pre-defined codes with check boxes that can be marked off.
  - Supports multiple raters and calculates inter-rater reliability
statistics (Cohen's kappa)
  - highlights words of interest to the coders on the web display
  - includes some analysis ability: can click on any code (in analysis
mode) and all messages coded by either rater (or just one rater) will
be displayed; can display messages where there was disagreement
between raters; can find messages coded into multiple groups, etc.
Also shows overall summary stats on number of messages coded into each
group etc.

Any thoughts on programs that do these things, or even more generally,
tools that are useful in working with email (e.g., visualization of
messages) would be greatly appreciated by me and probably many other
list members. Thanks ahead of time.

Derek Hansen
Assistant Professor
iSchool at Maryland

More information about the Air-L mailing list