[Air-L] Text Data in Marketing: Data Sources, Linguistic Features, and Software Programs

Wed Oct 9 06:24:41 PDT 2019

Dear Internet Researchers:

I am delivering a demo to my department faculty, titled: Text Data in Marketing: Data Sources, Linguistic Features, and Software Programs. I request you to critique my coverage and suggest changes/additions.

As the title indicates, I will cover the following three aspects:

1.      Show, via an example, text data in marketing:

a.       firm-generated (e.g., earnings calls)

b.      consumer/user-generated text data (e.g., Twitter)

c.       other-generated, marketing-relevant text data. Other could include market stakeholders (competitors, suppliers, organizational customers) and nonmarket stakeholders (news media, consumer organizations, regulators, legislators)

I will take the example of the Volkswagen emissions scandal and show how this event led to text data generated by Volkswagen and its varied marketing stakeholders. I will mention various secondary data sources that my colleagues can use to obtain/buy text data.

If you know of any other source, or an example more insightful than Volkswagen, please help me.

2.      I will then proceed to discuss linguistic features of the text. These features include sentiment, emotion, cognition, named entities, readability, subjectivity, structural complexity, lexical complexity, and topic modeling/mining. I choose these features because I have used them in my research and can talk about them.

If you know of any other useful feature that I am missing, please respond.

3.      Software programs, both paid (e.g., LIWC) and free R/Python libraries, that take text as input and output the above linguistic features. I will start with LIWC, explain its variables from psychological standpoint. I will then demo syuzhet and sentimentr showing how their output variables offer new and different insights relative to LIWC. I will mention other R and Py packages such as TensorFlow, MXNet, and TextBlob).

I will demo MALLET GUI for topic modeling. I will then mention how researchers can use MTurk to annotate their text data and then write a classifier (or hire a Py programmer to write it for them).

If you know of any paid software program (that is as easy to use as LIWC) or any other R/Py package, please suggest.

Thank you!
Vivek Astvansh
Assistant Professor of Marketing,
Kelley School of Business, Indiana University
http://kelleyschool.iu.edu/astvansh  | +1 (812) 855-8953