[Air-L] Content Analysis of Historical Tweet Set

Thu May 24 09:33:15 PDT 2018

Hi Fran

I’ve done some work around analysing tweets (and other text from social media) using R. I’ve put together a walkthrough video, sample data set, and the relevant code here on this blog post. http://harkive.org/h17-text-analysis/ - you’d be most welcome to use some or all of those resources.

Don’t worry if you’ve not used R before - the script I’ve provided in that post should work if you create a copy of your dataset and change the column names to match the sample dataset I’ve provided. I’ve not used R with a dataset of the size you’re dealing with, so I can’t tell you how well it / your computer will handle things. Batches might be an idea, then, as suggested below, certainly if you want to try things out.

The script eventually runs into some Topic Modelling and Sentiment Analysis, but you can run through it section by section until you reach the end of the initial exploratory stage (word frequencies and so on). This might help you make some sense of what’s in the dataset, and will help you weed out any unwanted elements.

Happy to help if you want to run with any the above - I’d be intrigued by what the script I wrote came up with using a different type of data.

Kind regards
Craig

Dr Craig Hamilton
School of Media
3rd Floor, The Parkside Building
Birmingham City University
Birmingham, B5
07740 358162
t: @craigfots
e: craig.hamilton at bcu.ac.uk<mailto:craig.hamilton at bcu.ac.uk>
On 23 May 2018, at 03:22, f hodgkins <frances.hodgkins at gmail.com<mailto:frances.hodgkins at gmail.com>> wrote:

All-
I am working on a qualitative content analysis of a historical tweet set
from CrisisNLP from Imran et al.,(2016).
http://crisisnlp.qcri.org/lrec2016/lrec2016.html
I am using the California Earthquake dataset. The Tweets have been stripped
down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of
the Twitter information is discarded.

I am using is NVIVO- known for its power for content analysis --

However - I am finding NVIVO unwieldy for a data of this size (~250,000
tweets). I wanted each unique Tweet to function as its own case. But -
Nvivo would crash everytime.  I have 18G RAM and a Raid Array.
I do not have a server - although I could get one.

I am working and coding side by side in Excel and in NVIVO with my data in
10 large sections of .csv files, instead of individual cases- and this is
working (but laborious).

QUESTION:  Do you have any suggestions for software for large-scale content
analysis of Tweets?  I Do not need SNA capabilities.

Thank you very much,
Fran Hodgkins
Doctoral Candidate (currently suffering through Chapter 4)
Grand Canyon University
USA
_______________________________________________
The Air-L at listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/