[Air-L] Coding Analysis Toolkit - A new resource for researchers with large text annotation tasks

Stuart Shulman stuart.shulman at gmail.com
Fri Nov 23 05:37:40 PST 2007

Members of the list may be interested in this narrated slide show about the
Coding Analysis Toolkit developed by QDAP. It is viewable through a web
browser at:


The Coding Analysis Toolkit, colloquially known as "CAT", was developed in
the summer of 2007. It was designed by QDAP Director Dr. Stuart Shulman and
created in collaboration with Mark Hoy, a Senior Programmer in the Carnegie
Mellon University School of Computer Science. It is maintained by UCSUR
Technology Director James Lefcakis. CAT is is hosted on UCSUR servers and
made available on the web at: http://cat.ucsur.pitt.edu/. The system
consists of a web-based suite of tools custom built from the ground-up to
facilitate efficient and effective analysis of text datasets that have been
coded using the commercial-off-the-shelf package ATLAS.ti (www.atlasti.com).
We have recently posted a narrated slide show about CAT online.

The Coding Analysis Toolkit was designed to use keystrokes and automation to
clarify and speed-up the validation or consensus adjudication process.
Special attention was paid during the design process to the need to
eliminate the role of the computer mouse, thereby streamlining the physical
and mental tasks in the coding analysis process. We anticipate that CAT will
open new avenues for researchers interested in measuring and accurately
reporting coder validity and reliability, as well as for those practicing
consensus-based adjudication. The availability of CAT can improve the
practice of qualitative data analysis at the University of Pittsburgh and

Currently about 50 beta testers located in several countries have accounts
on CAT. They have been given free access to the system for the rest of the
calendar year. Systematic user feedback will be gathered via a beta tester
web survey and will shape the future development of CAT. The capabilities of
CAT and its reliability as a software tool may be sufficiently robust to
merit commercial licensing to users starting in 2008. The CAT system allows
a user to register for an account to log on, upload exported coded results
from ATLAS.ti into the system, and run comparisons of inter-rater
reliability measured using Fleiss' Kappa and Krippendorff's Alpha. The user
can also choose to perform a code-by-code comparison of the data, revealing
tables of quotations where coders agree, disagree, or overlap. For any
comparisons, the user can view the data on the screen, or alternatively,
download the data file as a rich-text file (.rtf).

CAT's core functionality allows for the adjudication of coded items by an
"expert" user who is a sub-account attached to the primary account holder of
the system. The website and database itself resides on a Windows 2003 UCSUR
server and the programming for the website is done using HTML, ASP.net
2.0and JavaScript. An expert user can log onto the system to validate
the codes
assigned to items in a dataset. While the expert user is validating codes,
the system also keeps track of which codes are valid and which coders
assigned those codes. This information is used to keep a historical track
record of coders for assessing coder accuracy over time. It also allows the
account holder to see a rank order list of the coders most likely to produce
valid observations, report the overall validity scores by code, coder, or
entire project, and end up with a 'clean' dataset consisting of only valid

In a newly developed CAT module, which is being beta tested internally
during the fall 2007 semester by five QDAP coders, the project manager is
able to upload raw datasets and have users code those datasets directly
through the CAT interface. As is the case with the original adjudication
toolkit, this new module features automated loading of discrete quotations
and requires only keystrokes, instead of mouse clicks and drags, to apply
the codes to the text. We estimate coding tasks using CAT are completed 2-3
times as fast as identical coding tasks conducted using ATLAS. While this
high-speed "mouse-less" coding module would poorly serve many traditional
qualitative research approaches, it is ideally suited to annotation tasks
routinely generated by computer scientists.


Dr. Stuart W. Shulman
Director, Sara Fine Institute
School of Information Sciences
Director, Qualitative Data Analysis Program
University Center for Social and Urban Research
University of Pittsburgh
121 University Place, Suite 600
Pittsburgh, PA 15260
412.624.3776 (v) 412.624.4810 (f)
Editor, Journal of Information Technology and Politics

More information about the Air-L mailing list