[Air-L] Content Analysis of Historical Tweet Set

Kevin Driscoll driscollkevin at gmail.com
Thu May 24 08:56:20 PDT 2018


Hi Fran,

NVivo might still do the trick if you can shrink the size of your
population. Depending on the questions you are trying to answer with the
content analysis, perhaps you can find a way to collapse some of these
250,000 tweets into a smaller set and then sample from the resulting
subpopulation(s). (My hunch is that you may not need to load up every
single message to get a robust account of the discourse.)

Without knowing the details of your sample, here are some questions that
have helped me to reduce my sample in the past:
- Are all of the tweets "original" or are some of them "retweets"? Can you
analyze retweets separately from original messages?
- Are there other redundant or duplicate messages in your corpus?
Automatically generated messages sometimes follow patterns that are
(relatively) easy to spot. For example, a weather bot or a news aggregator
might spit out an update every 60 minutes. Is it necessary to code everyone
one?
- Can you group the tweets around certain URLs or meaningful phrases or
@-mentions and then treat these groups as single cases?

Ideally, these preliminary/exploratory analyses will reduce the size of the
corpus without sacrificing the validity of your sample. I hope this is
helpful. Please do let us know what you find!

Best of luck!

Kevin


Date: Tue, 22 May 2018 21:22:52 -0500
> From: f hodgkins <frances.hodgkins at gmail.com>
> To: air-l at listserv.aoir.org
> Subject: [Air-L] Content Analysis of Historical Tweet Set
> Message-ID:
>         <CAGoEHPopBGpopncbYBEnTpUw-LC7QLhF72uuHUQyaZq=wxNW1Q@
> mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> All-
> I am working on a qualitative content analysis of a historical tweet set
> from CrisisNLP from Imran et al.,(2016).
> http://crisisnlp.qcri.org/lrec2016/lrec2016.html
> I am using the California Earthquake dataset. The Tweets have been stripped
> down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of
> the Twitter information is discarded.
>
> I am using is NVIVO- known for its power for content analysis --
>
> However - I am finding NVIVO unwieldy for a data of this size (~250,000
> tweets). I wanted each unique Tweet to function as its own case. But -
> Nvivo would crash everytime.  I have 18G RAM and a Raid Array.
> I do not have a server - although I could get one.
>
> I am working and coding side by side in Excel and in NVIVO with my data in
> 10 large sections of .csv files, instead of individual cases- and this is
> working (but laborious).
>
> QUESTION:  Do you have any suggestions for software for large-scale content
> analysis of Tweets?  I Do not need SNA capabilities.
>
> Thank you very much,
> Fran Hodgkins
> Doctoral Candidate (currently suffering through Chapter 4)
> Grand Canyon University
> USA
>



More information about the Air-L mailing list