[Air-L] Tweet ID Datasets

Ed Summers ehs at pobox.com
Thu Apr 27 10:38:39 PDT 2017

The Documenting the Now [1] project would like to announce a small contribution that may be of interest to the AoIR community:


This website is a clearinghouse of tweet id datasets that are available elsewhere on the web. As you probably know Twitter's terms of service do not allow datasets of tweets to be distributed to third parties, but they do allow for lists of tweet identifiers to be shared.

Twitter provide an API to turn the identifiers back into data as long as the tweet hasn't been protected or deleted. This process is often referred to as hydrating. To make this process easier we've added hydrate functionality to the twarc [2] command line utility, and also introduced a desktop application called Hydrator [3].

There are instructions on the website on how to add a dataset to the catalog if you would like to. Unfortunately this editing process is currently a bit cumbersome since the web application is very lightweight at the moment. If you would like help please let us know. We would also be very interested in hearing any thoughts you might have about the ethics and mechanics of providing this type of information.



[1] http://www.docnow.io/
[2] https://github.com/docnow/twarc
[3] https://github.com/docnow/hydrator#readme

More information about the Air-L mailing list