[Air-L] twitter archived data

Stuart Shulman stuart.shulman at gmail.com
Fri Oct 28 16:37:33 PDT 2011

If and when the Library of Congress opens up a full service Twitter sampling
portal, that will be the place for the most complete population of archival
Tweets. There are reasons to wonder if that day will ever arrive. The
technical difficulty of that task is enormous.

We have archived about 150 million tweets so far and the computational
challenges presented delivering sampling and analytic tools are significant.
Until such time as we get a WayBack machine for all tweets, the best thing
to do is to archive them day-forward as history unfolds.

We first experimented with this during the Arab Spring and have since
created numerous collections using the public API and now also with the
Power Track for Twitter (the so-called full firehose) from GNIP. The minute
news breaks, or a hunch emerges about an interesting stream of Tweets (ex.,
#occupy or #ghaddafi) we start a new collector running and let it run until
enough tweets are archived for almost any conceivable study purpose. When a
stream is high volume, this does not take long. With the GNIP Power Track,
you also get interesting metadata, depending on the user who created the
Tweet. Here is a sample of some of the fields you might see associated with
a Tweet.

country code:
klout score:
location coord type:
location coords:
location displayname:
location type:
posted time:
real name:
rule match:
tweet url:
user twitter page:

The trick is to be ready as historic news breaks or a movement unfolds.
Whereas we used to teach our students that the newspaper was the first draft
of history, now we must certainly say the newspaper is third or fourth. The
first draft of history is Twitter and Facebook. The digital artifacts are in
many ways more democratic when user-created, not to mention super amenable
to analysis that might test that hunch.

We are just finishing round one of a very fruitful first round beta test of
the GNIP Power Track.

There will be round two in about a week or two from now. The beta is free.
To sign up, please visit:


On Fri, Oct 28, 2011 at 3:33 PM, Pablo Garaizar Sagarminaga <
garaizar at deusto.es> wrote:

> Hi Robert,
> on Fri, 28 Oct 2011 13:49:05 -0400 nativebuddha
> <nativebuddha at gmail.com> wrote:
> > Are there any good places to find one-to-two month-old twitter data?
> > (other than http://twapperkeeper.com/)
> Ulf-Dietrich Reips and I created a service that may be of your interest,
> iScience Maps: http://maps.iscience.deusto.es
> As we are not allowed to provide Twitter data itself (current Twitter's
> Terms of Service explicitly forbids it), iScience Maps Global Search
> allows researchers to do quantitative searches from our local database
> (not very big -we solely store geo-tagged tweets from Twitter
> Streaming API-, around 6 million tweets from 15th of September 2010 to
> the present day).
> I know this is not what you asked, but I hope it helps O:-)
> Best,
> --
>  Pablo Garaizar Sagarminaga
>  Universidad de Deusto
>  Avda. de las Universidades 24
>  48007 Bilbao - Spain
>  Phone:       +34-94-4139000 Ext 2512
>  Fax:                  +34-94-4139101
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> Join the Association of Internet Researchers:
> http://www.aoir.org/


Stuart Shulman
President & CEO
Texifter, LLC <http://www.texifter.com/>

Have you tried DiscoverText?
*Featuring the Facebook Graph & Twitter APIs*

More information about the Air-L mailing list