[Air-L] Content Analysis of Historical Tweet Set

Thu May 24 11:55:14 PDT 2018

Hi Craig and AIR team, 

Along these lines, I was wondering if anyone would be able to point me at some methods that could help me analyse twitter data in Spanish? I’m trying to gauge the ‘emotional’ language used by Chavistas in the (very recent) Venezuelan election by looking at a set of 100 twitter accounts that are dedicated to publishing content in favour of Maduro (the current president), and 100 twitter accounts that are dedicated to publishing opposition content, as a control. I thought I might be able to use your script as well, Craig? Do you think it might be worth a try somehow in Spanish?

I am able to download all the tweets using “Twint” https://github.com/haccer/twint <https://github.com/haccer/twint>
I’m attaching the link as this might be of great use to some researchers out there. It also has a module that can be run from a Python console, or it can be run with commands directly from the terminal. It is extremely fast and reliable. It can also download by hashtags, or search queries, download all followers or the accounts that the user follows. 

Aside from something very basic such as word count of emotional words (love, hate, etc) a list that I will create in Spanish itself, I’m wondering if there are other interesting methods that could be applied to this set. I have also thought of choosing the 50 tweets that have been retweeted the most by Chavistas and do a “manual” coding of these? And try to correlate popularity with emotional density (measured by emotional words/ overall words in a tweet). But curious to see if there are other ideas, perhaps related to topics and their visualisation? Or any ideas for training an algorithm to code by emotional theme/topic for the rest of the dataset? Or ideas about looking at the dataset historically?

Really any help, methodological ideas, visualisation ideas are greatly, greatly appreciated!
Many thanks everyone, 

Warmly, 
Parvathi

________

Parvathi Subbiah
PhD Candidate, Gates Cambridge Scholar
Department of Politics and International Studies
Centre for Latin American Studies
University of Cambridge

> On 24 May 2018, at 17:33, Craig Hamilton <Craig.Hamilton at bcu.ac.uk> wrote:
> 
> Hi Fran
> 
> I’ve done some work around analysing tweets (and other text from social media) using R. I’ve put together a walkthrough video, sample data set, and the relevant code here on this blog post. http://harkive.org/h17-text-analysis/ - you’d be most welcome to use some or all of those resources.
> 
> Don’t worry if you’ve not used R before - the script I’ve provided in that post should work if you create a copy of your dataset and change the column names to match the sample dataset I’ve provided. I’ve not used R with a dataset of the size you’re dealing with, so I can’t tell you how well it / your computer will handle things. Batches might be an idea, then, as suggested below, certainly if you want to try things out.
> 
> The script eventually runs into some Topic Modelling and Sentiment Analysis, but you can run through it section by section until you reach the end of the initial exploratory stage (word frequencies and so on). This might help you make some sense of what’s in the dataset, and will help you weed out any unwanted elements.
> 
> Happy to help if you want to run with any the above - I’d be intrigued by what the script I wrote came up with using a different type of data.
> 
> Kind regards
> Craig
> 
> Dr Craig Hamilton
> School of Media
> 3rd Floor, The Parkside Building
> Birmingham City University
> Birmingham, B5
> 07740 358162
> t: @craigfots
> e: craig.hamilton at bcu.ac.uk<mailto:craig.hamilton at bcu.ac.uk>
> On 23 May 2018, at 03:22, f hodgkins <frances.hodgkins at gmail.com<mailto:frances.hodgkins at gmail.com>> wrote:
> 
> All-
> I am working on a qualitative content analysis of a historical tweet set
> from CrisisNLP from Imran et al.,(2016).
> http://crisisnlp.qcri.org/lrec2016/lrec2016.html
> I am using the California Earthquake dataset. The Tweets have been stripped
> down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of
> the Twitter information is discarded.
> 
> I am using is NVIVO- known for its power for content analysis --
> 
> However - I am finding NVIVO unwieldy for a data of this size (~250,000
> tweets). I wanted each unique Tweet to function as its own case. But -
> Nvivo would crash everytime.  I have 18G RAM and a Raid Array.
> I do not have a server - although I could get one.
> 
> I am working and coding side by side in Excel and in NVIVO with my data in
> 10 large sections of .csv files, instead of individual cases- and this is
> working (but laborious).
> 
> QUESTION:  Do you have any suggestions for software for large-scale content
> analysis of Tweets?  I Do not need SNA capabilities.
> 
> Thank you very much,
> Fran Hodgkins
> Doctoral Candidate (currently suffering through Chapter 4)
> Grand Canyon University
> USA
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/
> 
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/