[Air-L] Guest Editors’ Introduction: Text Annotation for Political Science Research

Stuart Shulman stuart.shulman at gmail.com
Sat Sep 13 20:35:14 PDT 2008

I think the Guest Editor's Intro, written by a computer scientist and
political scientist, is well worth the read.


The ITP section of APSA (fast growing these days) just voted to adopt
JITP as the official journal of the section at the recent APSA


Guest Editors' Introduction
Text Annotation for Political Science Research
Page Range: 1 - 6
DOI: 10.1080/19331680802149590
Claire Cardie, John Wilkerson

Text Annotation and the Cognitive Architecture of Political Leaders:
British Prime Ministers from 1945–2008
Page Range: 7 - 18
DOI: 10.1080/19331680802149624
Stephen Benedict Dyson

While differences in the personalities of leaders dominate popular
discussion of politics, the systematic academic study of these factors
has long been beset by problems of conceptualization and
measurement—difficulties that have led many in political science to
conclude that such studies are not worth the effort. In this light,
one of the most exciting recent developments in political psychology
has been the emergence of text analysis schemes, and accompanying
automation software, that offer the possibility of treating what
leaders say as indicative of how they think. In this essay, I consider
a text analysis protocol designed to isolate the cognitive
architecture of political leaders, in particular their characteristic
information processing propensities, and apply the protocol to a
comprehensive set of text: the universe of prime ministerial responses
to foreign policy questions in the British House of Commons from
1945–2008. The resulting data, encompassing twelve separate prime
ministers, shows that the technique can discriminate reliably between
individuals and exhibits promising signs of validity.
Keywords: Text annotation, political psychology, foreign policy
analysis, British prime ministers

CORPS: A Corpus of Tagged Political Speeches for Persuasive
Communication Processing
Page Range: 19 - 32
DOI: 10.1080/19331680802149616
Marco Guerini, Carlo Strapparava, Oliviero Stock

In political speech, even if the audience is sympathetic to the
speaker and does not need to be persuaded, it tends to react or
respond to signals of persuasive communication (including an expected
theme, a name, an expression, and the tone of the voice). In this
article, we describe the creation of a corpus of political speeches
tagged with audience reactions, such as applause, as indicators of
persuasive expressions. We hypothesize that corpora of this kind can
be usefully employed in the qualitative analysis of political
communication. In addition, we present a corpus-based approach for
persuasive expression mining that relies on techniques from natural
language processing (NLP). We show how the approach can support the
analysis of political communication, providing insights well beyond
those of traditional word-counting analysis techniques.
Keywords: Persuasion, natural language processing, political
communication, corpora collection

Classifying Party Affiliation from Political Speech
Page Range: 33 - 48
DOI: 10.1080/19331680802149608
Bei Yu, Stefan Kaufmann, Daniel Diermeier

In this article, we discuss the design of party classifiers for
Congressional speech data. We then examine these party classifiers'
person-dependency and time-dependency. We found that party classifiers
trained on 2005 House speeches can be generalized to the Senate
speeches of the same year, but not vice versa. The classifiers trained
on 2005 House speeches performed better on Senate speeches from recent
years than on older ones, which indicates the classifiers'
time-dependency. This dependency may be caused by changes in the issue
agenda or the ideological composition of Congress.
Keywords: Machine learning, text classification, generalizability,
ideology, evaluation

Recognizing Citations in Public Comments
Page Range: 49 - 71
DOI: 10.1080/19331680802153683
Jaime Arguello, Jamie Callan, Stuart Shulman

Notice and comment rulemaking is central to how U.S. federal agencies
craft new regulation. E-rulemaking, the process of soliciting and
considering public comments that are submitted electronically, poses a
challenge for agencies. The large volume of comments received makes it
difficult to distill and address the most substantive concerns of the
public. This work attempts to alleviate this burden by applying
existing machine learning techniques to the problem of recognizing
citation sentences. A citation in this context is defined as a
statement in which the author of the public comment references an
external source of factual information that is associated with a
specific person or organization. The problem is formulated as a binary
classification problem: Is a specific person or organization mentioned
in a sentence being referenced as an external source of information?
We show that our definition of a citation is reproducible by human
judges and that citations can be detected using machine learning
techniques with some success. Casting this as a machine learning
problem requires selecting an appropriate representation of the
sentence. Several feature sets are evaluated individually and in
combination. Superior results are obtained by combining feature sets.
Syntactic features, which characterize the structure of the sentence
rather than its content, significantly improve accuracy when combined
with other features, but not when used in isolation. Although
prediction error rate is adequate, coverage could be improved. An
error analysis enumerates short-term and long-term challenges that
must be overcome to improve recall.
Keywords: Citation analysis, public comments, e-rulemaking, text
mining, information extraction, machine learning

Good News or Bad News? Conducting Sentiment Analysis on Dutch Text to
Distinguish Between Positive and Negative Relations
Page Range: 73 - 94
DOI: 10.1080/19331680802154145
Wouter van Atteveldt, Jan Kleinnijenhuis, Nel Ruigrok, Stefan Schlobach
Many research questions in political communication can be answered by
representing text as a network of positive or negative relations
between actors and issues such as conducted by semantic network
analysis. This article presents a system for automatically determining
the polarity (positivity/negativity) of these relations by using
techniques from sentiment analysis. We used a machine learning model
trained on the manually annotated news coverage of the Dutch 2006
elections, collecting lexical, syntactic, and word-similarity based
features, and using the syntactic analysis to focus on the relevant
part of the sentence. The performance of the full system is
significantly better than the baseline with an F1 score of .63.
Additionally, we replicate four studies from an earlier analysis of
these elections, attaining correlations of greater than .8 in three
out of four cases. This shows that the presented system can be
immediately used for a number of analyses.
Keywords: Sentiment analysis, valence, polarity, political
communication, automatic content analysis, semantic network analysis

Automatic Annotation of Semantic Fields for Political Science Research
Page Range: 95 - 120
DOI: 10.1080/19331680802149640
Beata Beigman Klebanov, Daniel Diermeier, Eyal Beigman

This article discusses methods for automatic annotation of political
texts for semantic fields—groups of words with related meanings. This
type of annotation is useful when studying political communication,
such as legislative debate or political speeches. We present three
types of automatic annotation: unsupervised clustering,
dictionary-based approaches, and a method based on relevant
experimental data. All methods are applied to analyzing Margaret
Thatcher's political rhetoric. For this data, we find that
unsupervised clustering is most useful for tracing topics;
dictionary-based methods are most effective in a comparative setting;
whereas the last method is the most promising for detecting off-topic,
singular uses of semantic domains, which are often rhetorical tools
used to achieve a political end. Applicability, strengths, and
weaknesses of each method and of their combinations are addressed in
Keywords: Political communication, speech, rhetoric, semantic fields,
topic, framing, clustering, lexical cohesion, Thatcher, Blair

Workbench Note

An Automated Approach to Investigating the Online Media Coverage of
U.S. Presidential Elections
Page Range: 121 - 132
DOI: 10.1080/19331680802149582
Arno Scharl, Albert Weichselbraun

This paper presents the U.S. Election 2004 Web Monitor, a public Web
portal that captured trends in political media coverage before and
after the 2004 U.S. presidential election. Developed by the authors of
this article, the webLyzard suite of Web mining tools provided the
required functionality to aggregate and analyze about a half-million
documents in weekly intervals. The study paid particular attention to
the editorial slant, which is defined as the quantity and tone of a
Web site's coverage as influenced by its editorial position. The
observable attention and attitude toward the candidates served as
proxies of editorial slant. The system identified attention by
determining the frequency of candidate references and measured
attitude towards the candidate by looking for positive and negative
expressions that co-occur with these references. Keywords and
perceptual maps summarized the most important topics associated with
the candidates, placing special emphasis on environmental issues.
  	Keywords: U.S. presidential elections, media monitoring, Web
mining, natural language processing, semantic orientation, keyword

Workbench Note

Media Monitoring by Means of Speech and Language Indexing for Political Analysis
Page Range: 133 - 146
DOI: 10.1080/19331680802149632
Iason Demiros, Harris Papageorgiou, Vassilios Antonopoulos, Andreas
Pipis, Athena Skoulariki

In this article, we describe a media monitoring system that we have
developed and implemented for the Secretariat General of Communication
and Secretariat General of Information in Greece (SGC-SGI). The system
applies emerging technologies for audiovisual recording, speech
recognition, language processing, multimedia indexing, and retrieval,
all integrated into a large video and audio library that covers
broadcast news and current affairs in Greek and English. It assists
SGC-SGI in compiling information; annotating and analyzing news; and
monitoring national, political, social, economic, cultural, and
environmental issues concerning Greece in general.
  	Keywords: Video annotation, speech recognition, multimedia
information retrieval, e-government, political analysis, media content

Book Review

The Internet and National Elections: A Comparative Study of Web
Campaigning, Randolph Kluver, Nicholas Jankowski, Kristen Foot, and
Steven Schneider (Eds.) New York: Routledge, 2007, 279 pages
Page Range: 147 - 148
DOI: 10.1080/19331680801979062
Arthur Sanders

Book Review

Radical Democracy and the Internet: Interrogating Theory and Practice,
Lincoln Dahlberg and Eugenia Siapera Basingstoke, UK: Palgrave, 2007,
272 pages
Page Range: 149 - 150
DOI: 10.1080/19331680802132612
Stephen Coleman

Book Review

Zero Comments: Blogging and Critical Internet Culture, Geert Lovink
New York: Routledge, 2007, 344 pages
Page Range: 150 - 151
DOI: 10.1080/19331680802042282
Kevin Wallsten

Book Review

Cybercrime: Digital Cops in a Networked Environment, Jack M. Balkin,
James Grimmelmann, Eddan Katz, Nimrod Kozlovski, Shlomit Wagman, and
Tal Zarsky (Eds.) New York: New York University Press, 2006, 276,
Page Range: 151 - 153
DOI: 10.1080/19331680802042324
David S. Wall

Book Review

Mobile Communication and Society: A Global Perspective, Manuel
Castells, Mireia Fernández-Ardèvol, Jack Linchuan Qiu, and Araba Sey
Cambridge, MA: The MIT Press, 2007, 331 pages
Page Range: 154 - 155
DOI: 10.1080/19331680802042373
Kenneth Rogerson

Dr. Stuart W. Shulman
Assistant Professor
Department of Political Science
University of Massachusetts Amherst
stu at polsci.umass.edu

Editor, Journal of Information Technology & Politics

Director, QDAP-UMass

Associate Director, National Center for Digital Government

More information about the Air-L mailing list