[Air-L] Twitter data collection tools

Libby Hemphill libbyh at gmail.com
Wed Nov 11 07:56:42 PST 2015


As others have pointed out, no matter what tool you use, Twitter data is
dynamic and will change basically as soon as you collect it. It also has
API limitations that are sometimes clear and sometimes note. Not being able
to pull a complete timeline for users with thousands of tweets is a
technical limit for sure. Some of the variables you mention, retweets for
instance, can be roughly calculated from the user and tweet data the API
does make available. I just added issues to my own repo about retweets and
plain text parsing. Those will likely get attention next week or later.
Just like any data collection, what you actually need and how to calculate
it depends on what questions you want to answer.

On Wed, Nov 11, 2015 at 7:49 AM, Cory Salveson <corysalveson at gmail.com>
wrote:

> In addition to looking at tools, you might want to consider looking at the
> Twitter API documentation to get a sense for what data and metadata are
> made available by Twitter vs. what data you think you need. Here, for
> example, is the list of all fields related to tweets:
> https://dev.twitter.com/overview/api/tweets; users:
> https://dev.twitter.com/overview/api/users; and "entities in objects,"
> such
> as hashtags:
> https://dev.twitter.com/overview/api/entities-in-twitter-objects. I
> believe
> there may also be some limits placed on how far back in time you can
> request data via the API. So, any tool that provides Twitter data, unless
> it's doing something really unique, is going to be limited by what's
> available in these field lists and by the constraint of when it can or did
> start pulling data for the accounts you want to analyze.
>
> As others have mentioned, then, there's no way to use the API to say, e.g.,
> "give me all tweets ever by user X". Instead, you (or your tool/service)
> would have to make repeated requests until you felt you had obtained all
> tweets, possibly on a go-forward basis from whenever the time cutoff is. To
> actually do this, I can second the suggestion for twitteR; using e.g. this
> tutorial, you can pretty easily pull anything the API provides, in batch,
> and accumulate what you need over time in a native R data structure (or
> just write it to CSV, etc.):
> http://www.r-bloggers.com/getting-started-with-twitter-in-r/. With your
> input of IDs, you would just have to loop through it in R.
>
> You may also want to check out the Twitter support in such generic data
> mining/processing tools as RapidMiner (
> http://docs.rapidminer.com/studio/how-to/cloud-connectivity/twitter.html),
> Talend (http://www.datalytyx.com/twitter-sentiment-analysis-using-talend/
> ),
> KNIME (https://www.knime.org/blog/knime-twitter-nodes), or Pentaho (
> http://www.patlaf.com/query-twitter-api-with-pentaho-pdi-kettle/). These
> are generic data integration/extraction/manipulation/analysis tools
> designed to help build data flows visually and in batch. R wins, as it
> often does, for simplicity and control reasons, but because these other
> tools are more visual and are designed specifically for batch processing,
> they may also be worth exploring.
>
> Cheers,
>
> Cory Salveson
> http://corysalveson.com | @argotechnica <https://twitter.com/argotechnica>
>
> On Wed, Nov 11, 2015 at 12:35 AM, Maurice Vergeer <m.vergeer at maw.ru.nl>
> wrote:
>
> > you could look at R's twitterR package. See
> > https://cran.r-project.org/web/packages/twitteR/twitteR.pdf
> > But then you would need to know how to use R. Not that difficult but yet
> a
> > new program. One benefit: you have it directly ready for analysis
> > Hope that helps
> > Maurice
> >
> > On Wed, Nov 11, 2015 at 3:59 AM, Gohar F. Khan <gohar.feroz at gmail.com>
> > wrote:
> >
> > > Hello list members:
> > >
> > > I am looking for tools which can help extract all possible Twitter
> > > statistics (such as, number of tweets, followers, followings, mentions,
> > > re-tweets, favorites) for a list of Twitter handlers (around 120
> > > accounts). *In
> > > particular, I look for a tool that can take the IDs as a single file
> and
> > > provide the desired statistics for each ID. *
> > >
> > > The Webometrics Analyst has this functionality, but unfortunately it
> only
> > > provides followers and followings data. I am also familiar with the
> > several
> > > other tools including the ones mentioned in the Dean Freelon's curated
> > list
> > > <
> > >
> >
> https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lYwctj6ek6ryqDOiQ/edit
> > > >,
> > > but
> > > non of these can extract all the information I need. Some tools provide
> > > more statistics, but they work with one ID at time.
> > >
> > > I will greatly appreciate any suggestions.
> > >
> > >
> > > Thank you,
> > >
> > > --
> > >
> > > Gohar Feroz Khan, PhD
> > >
> > > Adjunct Faculty & Research Adviser
> > > Korea Advance Institute of Science & Technology (KAIST)
> > > Global Information and Telecommunication Technology Program (ITTP)
> > > 291 Daehak-ro, Yuseong-gu, Daejeon, South Korea.
> > >
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > > Check out my new book on social media analytics
> > > <http://7layersanalytics.com/>!
> > > ------------------------------------------------------------
> > > --------------------
> > > Please consider submitting your work to the social media analytics
> track
> > at
> > > PACIS201 <http://www.pacis2016.org/Page/Index/71>6.
> > > ------------------------------------------------------------
> > > --------------------
> > > Social Identities: || Blog <http://gfkhan.wordpress.com> || Twitter
> > > <https://twitter.com/gfkhan> || LinkedIn
> > > <https://www.linkedin.com/pub/gohar-feroz-khan/7/62b/42> || Research
> > > Centre
> > > <http://centreforsocialtech.com/>||
> > > _______________________________________________
> > > The Air-L at listserv.aoir.org mailing list
> > > is provided by the Association of Internet Researchers http://aoir.org
> > > Subscribe, change options or unsubscribe at:
> > > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >
> > > Join the Association of Internet Researchers:
> > > http://www.aoir.org/
> > >
> >
> >
> >
> > --
> > ________________________________________________
> > Maurice Vergeer
> > To contact me, see http://mauricevergeer.nl/node/5
> > To see my publications, see http://mauricevergeer.nl/node/1
> > ________________________________________________
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> >
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



More information about the Air-L mailing list