[Air-L] Content Analysis of Historical Tweet Set

Fri May 25 00:53:20 PDT 2018

Dear Parvathi,

Thanks for the email. By all means please do use the script. It should certainly work with the initial Document Term Matrix construction providing you amend the vector of Stop Words to Spanish, although you might encounter problems when words are stemmed to their roots. If you can get over those hurdles then the Topic Modelling element *should* work as this is based on the numbers in the DTM, rather than the words themselves. The Sentiment Analysis part will require sourcing Spanish rather than English reference libraries. I’ve not had a need to search for those, but I’d be surprised if such things didn’t exist.

Rather than clogging up this list, perhaps you could email directly if you require further assistance. Fran, likewise, feel free to drop me a line. The same offer extends to any other AOIR members, of course.

Kind regards
Craig

On 24 May 2018, at 19:55, Parvathi Subbiah <pas89 at cam.ac.uk<mailto:pas89 at cam.ac.uk>> wrote:

Hi Craig and AIR team,

Along these lines, I was wondering if anyone would be able to point me at some methods that could help me analyse twitter data in Spanish? I’m trying to gauge the ‘emotional’ language used by Chavistas in the (very recent) Venezuelan election by looking at a set of 100 twitter accounts that are dedicated to publishing content in favour of Maduro (the current president), and 100 twitter accounts that are dedicated to publishing opposition content, as a control. I thought I might be able to use your script as well, Craig? Do you think it might be worth a try somehow in Spanish?

I am able to download all the tweets using “Twint” https://github.com/haccer/twint
I’m attaching the link as this might be of great use to some researchers out there. It also has a module that can be run from a Python console, or it can be run with commands directly from the terminal. It is extremely fast and reliable. It can also download by hashtags, or search queries, download all followers or the accounts that the user follows.

Aside from something very basic such as word count of emotional words (love, hate, etc) a list that I will create in Spanish itself, I’m wondering if there are other interesting methods that could be applied to this set. I have also thought of choosing the 50 tweets that have been retweeted the most by Chavistas and do a “manual” coding of these? And try to correlate popularity with emotional density (measured by emotional words/ overall words in a tweet). But curious to see if there are other ideas, perhaps related to topics and their visualisation? Or any ideas for training an algorithm to code by emotional theme/topic for the rest of the dataset? Or ideas about looking at the dataset historically?

Really any help, methodological ideas, visualisation ideas are greatly, greatly appreciated!
Many thanks everyone,

Warmly,
Parvathi

________

Parvathi Subbiah
PhD Candidate, Gates Cambridge Scholar
Department of Politics and International Studies
Centre for Latin American Studies
University of Cambridge

On 24 May 2018, at 17:33, Craig Hamilton <Craig.Hamilton at bcu.ac.uk<mailto:Craig.Hamilton at bcu.ac.uk>> wrote:

Hi Fran

I’ve done some work around analysing tweets (and other text from social media) using R. I’ve put together a walkthrough video, sample data set, and the relevant code here on this blog post. http://harkive.org/h17-text-analysis/ - you’d be most welcome to use some or all of those resources.

Don’t worry if you’ve not used R before - the script I’ve provided in that post should work if you create a copy of your dataset and change the column names to match the sample dataset I’ve provided. I’ve not used R with a dataset of the size you’re dealing with, so I can’t tell you how well it / your computer will handle things. Batches might be an idea, then, as suggested below, certainly if you want to try things out.

The script eventually runs into some Topic Modelling and Sentiment Analysis, but you can run through it section by section until you reach the end of the initial exploratory stage (word frequencies and so on). This might help you make some sense of what’s in the dataset, and will help you weed out any unwanted elements.

Happy to help if you want to run with any the above - I’d be intrigued by what the script I wrote came up with using a different type of data.

Kind regards
Craig

Dr Craig Hamilton
School of Media
3rd Floor, The Parkside Building
Birmingham City University
Birmingham, B5
07740 358162
t: @craigfots
e: craig.hamilton at bcu.ac.uk<mailto:craig.hamilton at bcu.ac.uk><mailto:craig.hamilton at bcu.ac.uk>
On 23 May 2018, at 03:22, f hodgkins <frances.hodgkins at gmail.com<mailto:frances.hodgkins at gmail.com><mailto:frances.hodgkins at gmail.com>> wrote:

All-
I am working on a qualitative content analysis of a historical tweet set
from CrisisNLP from Imran et al.,(2016).
http://crisisnlp.qcri.org/lrec2016/lrec2016.html
I am using the California Earthquake dataset. The Tweets have been stripped
down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of
the Twitter information is discarded.

I am using is NVIVO- known for its power for content analysis --

However - I am finding NVIVO unwieldy for a data of this size (~250,000
tweets). I wanted each unique Tweet to function as its own case. But -
Nvivo would crash everytime.  I have 18G RAM and a Raid Array.
I do not have a server - although I could get one.

I am working and coding side by side in Excel and in NVIVO with my data in
10 large sections of .csv files, instead of individual cases- and this is
working (but laborious).

QUESTION:  Do you have any suggestions for software for large-scale content
analysis of Tweets?  I Do not need SNA capabilities.

Thank you very much,
Fran Hodgkins
Doctoral Candidate (currently suffering through Chapter 4)
Grand Canyon University
USA
_______________________________________________
The Air-L at listserv.aoir.org<mailto:Air-L at listserv.aoir.org> mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/

_______________________________________________
The Air-L at listserv.aoir.org<mailto:Air-L at listserv.aoir.org> mailing list
is provided by the Association of Internet Researchers http://aoir.org<http://aoir.org/>
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/