[Air-L] [EXT] Re: Content Analysis of Historical Tweet Set
Shulman, Stu
stu at texifter.com
Fri May 25 09:06:23 PDT 2018
Or use Sifter to create a workable estimate:
sifter.texifter.com
and DiscoverText to collaboratively search, filter, deduplicate, code, and
machine-classify the results.
Academics working in multi-university and cross national teams have had
success:
https://discovertext.com/publications/
It is all point & click via a web browser without having to depend on
others to write your scripts, store your data, or perform analytical
functions.
We visit universities, offer free workshops, sponsor lots of students, and
grew out of NSF-funded basic research, therefore we can help you not only
collect, clean, store, analyze, and report on Twitter data in a manner that
observes the Twitter Terms of Service, but the kernel of our approach
consists of a set of tools for measuring inter-rater reliability and
adjudicating the differences between annotators.
https://discovertext.com/start-a-free-trial/
~Stu
On Fri, May 25, 2018 at 11:52 AM, Bell, Valarie <Valarie.Bell at unt.edu>
wrote:
> Hi Fran:
>
>
> I did an identical type of study (looking at a different extreme natural
> phenomenon) with historical tweets a couple of years ago and encountered
> the same problem.The answer: find someone at your university who uses
> Python and they can teach you in a few hours to collect those tweets, code
> them and analyze them. It's worth it!
>
>
> Valarie J. Bell, M.A., Ph.D.
>
> Computational Social Scientist,
> Instructor of Digital Communication Analytics
> University of North Texas
>
> Mayborn Graduate Institute of Journalism
>
> ________________________________
> From: Air-L <air-l-bounces at listserv.aoir.org> on behalf of Kevin Driscoll
> <driscollkevin at gmail.com>
> Sent: Thursday, May 24, 2018 10:56:20 AM
> To: air-l at listserv.aoir.org
> Subject: [EXT] Re: [Air-L] Content Analysis of Historical Tweet Set
>
> Hi Fran,
>
> NVivo might still do the trick if you can shrink the size of your
> population. Depending on the questions you are trying to answer with the
> content analysis, perhaps you can find a way to collapse some of these
> 250,000 tweets into a smaller set and then sample from the resulting
> subpopulation(s). (My hunch is that you may not need to load up every
> single message to get a robust account of the discourse.)
>
> Without knowing the details of your sample, here are some questions that
> have helped me to reduce my sample in the past:
> - Are all of the tweets "original" or are some of them "retweets"? Can you
> analyze retweets separately from original messages?
> - Are there other redundant or duplicate messages in your corpus?
> Automatically generated messages sometimes follow patterns that are
> (relatively) easy to spot. For example, a weather bot or a news aggregator
> might spit out an update every 60 minutes. Is it necessary to code everyone
> one?
> - Can you group the tweets around certain URLs or meaningful phrases or
> @-mentions and then treat these groups as single cases?
>
> Ideally, these preliminary/exploratory analyses will reduce the size of the
> corpus without sacrificing the validity of your sample. I hope this is
> helpful. Please do let us know what you find!
>
> Best of luck!
>
> Kevin
>
>
> Date: Tue, 22 May 2018 21:22:52 -0500
> > From: f hodgkins <frances.hodgkins at gmail.com>
> > To: air-l at listserv.aoir.org
> > Subject: [Air-L] Content Analysis of Historical Tweet Set
> > Message-ID:
> > <CAGoEHPopBGpopncbYBEnTpUw-LC7QLhF72uuHUQyaZq=wxNW1Q@
> > mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > All-
> > I am working on a qualitative content analysis of a historical tweet set
> > from CrisisNLP from Imran et al.,(2016).
> > http://crisisnlp.qcri.org/lrec2016/lrec2016.html
> > I am using the California Earthquake dataset. The Tweets have been
> stripped
> > down to the Day/Time/ Tweet ID and the content of the Tweet. The rest of
> > the Twitter information is discarded.
> >
> > I am using is NVIVO- known for its power for content analysis --
> >
> > However - I am finding NVIVO unwieldy for a data of this size (~250,000
> > tweets). I wanted each unique Tweet to function as its own case. But -
> > Nvivo would crash everytime. I have 18G RAM and a Raid Array.
> > I do not have a server - although I could get one.
> >
> > I am working and coding side by side in Excel and in NVIVO with my data
> in
> > 10 large sections of .csv files, instead of individual cases- and this is
> > working (but laborious).
> >
> > QUESTION: Do you have any suggestions for software for large-scale
> content
> > analysis of Tweets? I Do not need SNA capabilities.
> >
> > Thank you very much,
> > Fran Hodgkins
> > Doctoral Candidate (currently suffering through Chapter 4)
> > Grand Canyon University
> > USA
> >
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/
> listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/
> listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>
--
Dr. Stuart W. Shulman
Founder and CEO, Texifter
Cell: 413-992-8513
LinkedIn: http://www.linkedin.com/in/stuartwshulman
More information about the Air-L
mailing list