[Air-L] question on how to identify email threads on listserv
dthakur at gatech.edu
Thu Jun 18 16:16:45 PDT 2009
part of the research I am doing requires that I identify threads on a
listserv for analysis. Threads consist of emails that are a series of
responses to an initial email.
of course the easiest way to do this is to sort emails by subject
line. however as you might know this is not complete as, for example,
some participants will change the subject for a variety of reasons
while still remaining in the same thread. Thus one could analyze info
in the email header to identify threads, but in my case this data is
not always available. Alternatively, one could manually scan though
the text of the emails - which is very time consuming when using a
large email corpus.
Therefore, what I need is a method (preferably automated) that can
identify email threads by looking at the texts of the emails. I can
imagine some software that does this and can create clusters of
emails based on semantic similarities that I could equate to threads
- but I haven't been able to identify any just yet...
the units of analysis that I have described are fairly common and, I
imagine, so is my problem. Thus perhaps people on this list can point
me to existing methods/software/papers that have already addressed this issue?
School of Public Policy
Georgia Institute of Technology
More information about the Air-L