[Air-L] Twitter data collection tools

Cory Salveson corysalveson at gmail.com
Wed Nov 11 05:49:14 PST 2015


In addition to looking at tools, you might want to consider looking at the
Twitter API documentation to get a sense for what data and metadata are
made available by Twitter vs. what data you think you need. Here, for
example, is the list of all fields related to tweets:
https://dev.twitter.com/overview/api/tweets; users:
https://dev.twitter.com/overview/api/users; and "entities in objects," such
as hashtags:
https://dev.twitter.com/overview/api/entities-in-twitter-objects. I believe
there may also be some limits placed on how far back in time you can
request data via the API. So, any tool that provides Twitter data, unless
it's doing something really unique, is going to be limited by what's
available in these field lists and by the constraint of when it can or did
start pulling data for the accounts you want to analyze.

As others have mentioned, then, there's no way to use the API to say, e.g.,
"give me all tweets ever by user X". Instead, you (or your tool/service)
would have to make repeated requests until you felt you had obtained all
tweets, possibly on a go-forward basis from whenever the time cutoff is. To
actually do this, I can second the suggestion for twitteR; using e.g. this
tutorial, you can pretty easily pull anything the API provides, in batch,
and accumulate what you need over time in a native R data structure (or
just write it to CSV, etc.):
http://www.r-bloggers.com/getting-started-with-twitter-in-r/. With your
input of IDs, you would just have to loop through it in R.

You may also want to check out the Twitter support in such generic data
mining/processing tools as RapidMiner (
http://docs.rapidminer.com/studio/how-to/cloud-connectivity/twitter.html),
Talend (http://www.datalytyx.com/twitter-sentiment-analysis-using-talend/),
KNIME (https://www.knime.org/blog/knime-twitter-nodes), or Pentaho (
http://www.patlaf.com/query-twitter-api-with-pentaho-pdi-kettle/). These
are generic data integration/extraction/manipulation/analysis tools
designed to help build data flows visually and in batch. R wins, as it
often does, for simplicity and control reasons, but because these other
tools are more visual and are designed specifically for batch processing,
they may also be worth exploring.

Cheers,

Cory Salveson
http://corysalveson.com | @argotechnica <https://twitter.com/argotechnica>

On Wed, Nov 11, 2015 at 12:35 AM, Maurice Vergeer <m.vergeer at maw.ru.nl>
wrote:

> you could look at R's twitterR package. See
> https://cran.r-project.org/web/packages/twitteR/twitteR.pdf
> But then you would need to know how to use R. Not that difficult but yet a
> new program. One benefit: you have it directly ready for analysis
> Hope that helps
> Maurice
>
> On Wed, Nov 11, 2015 at 3:59 AM, Gohar F. Khan <gohar.feroz at gmail.com>
> wrote:
>
> > Hello list members:
> >
> > I am looking for tools which can help extract all possible Twitter
> > statistics (such as, number of tweets, followers, followings, mentions,
> > re-tweets, favorites) for a list of Twitter handlers (around 120
> > accounts). *In
> > particular, I look for a tool that can take the IDs as a single file and
> > provide the desired statistics for each ID. *
> >
> > The Webometrics Analyst has this functionality, but unfortunately it only
> > provides followers and followings data. I am also familiar with the
> several
> > other tools including the ones mentioned in the Dean Freelon's curated
> list
> > <
> >
> https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lYwctj6ek6ryqDOiQ/edit
> > >,
> > but
> > non of these can extract all the information I need. Some tools provide
> > more statistics, but they work with one ID at time.
> >
> > I will greatly appreciate any suggestions.
> >
> >
> > Thank you,
> >
> > --
> >
> > Gohar Feroz Khan, PhD
> >
> > Adjunct Faculty & Research Adviser
> > Korea Advance Institute of Science & Technology (KAIST)
> > Global Information and Telecommunication Technology Program (ITTP)
> > 291 Daehak-ro, Yuseong-gu, Daejeon, South Korea.
> >
> >
> >
> ------------------------------------------------------------------------------
> > Check out my new book on social media analytics
> > <http://7layersanalytics.com/>!
> > ------------------------------------------------------------
> > --------------------
> > Please consider submitting your work to the social media analytics track
> at
> > PACIS201 <http://www.pacis2016.org/Page/Index/71>6.
> > ------------------------------------------------------------
> > --------------------
> > Social Identities: || Blog <http://gfkhan.wordpress.com> || Twitter
> > <https://twitter.com/gfkhan> || LinkedIn
> > <https://www.linkedin.com/pub/gohar-feroz-khan/7/62b/42> || Research
> > Centre
> > <http://centreforsocialtech.com/>||
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> >
>
>
>
> --
> ________________________________________________
> Maurice Vergeer
> To contact me, see http://mauricevergeer.nl/node/5
> To see my publications, see http://mauricevergeer.nl/node/1
> ________________________________________________
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



More information about the Air-L mailing list