[Air-L] Academic replacements for TwapperKeeper.com?

Wed Feb 23 15:56:02 PST 2011

I feel like there was some rough estimate like a year or so from when they said it, but can't quite remember and don't have the spare time to nail that definitively. Either way, IIRC the LoC dataset will have some strange buffer - you'll have access to Tweets 6 months after they were created in order to de-fang any competitiveness in it being a resource for business or anything like that. It means that when this happens, we will most likely be swimming in seas of data, but its not clear when it will happen, and if this time buffer exists, it would be problematic for publication of any data at a internet-speed clip.

Devin

On Feb 23, 2011, at 12:39 PM, Nick Lalone wrote:

> I have been watching the Library of Congress for news on their twitter
> archiving. Have they stated when the Twitter archives will become
> available?
> 
> On Wed, Feb 23, 2011 at 1:37 PM, Deen Freelon <dfreelon at u.washington.edu>wrote:
> 
>> I would also be curious to know what others have been using or plan to use
>> for harvesting Twitter data. I've used both TwapperKeeper and 140kit, and
>> found that the latter is quite good for hashtag archiving, but not as good
>> at keyword archiving. Further, 140kit has a max scrape time of one week,
>> although that is manually renewable I believe. Finally, both TK and 140kit
>> can be quite slow and even unavailable at times, and as we've just seen they
>> may shut down at any time.
>> 
>> All of this has made me quite wary of relying on externally managed
>> "clouds" for data collection. That is why I intend to set up my own Twitter
>> harvesting operation for use within my own department, as many CS
>> researchers do, and would encourage others with the necessary means and
>> knowledge to do the same. Much valuable data can be collected even within
>> the default API query limits, though I'll certainly ask Twitter to put me on
>> the whitelist. Running one's own archiving operation is fairly cheap, and
>> since you're only archiving your own data, you aren't hamstrung by hundreds
>> of other jobs running simultaneously.
>> 
>> If there's any interest in learning how to set up small-scale Twitter
>> scrapes, let me know and I'll write something up when I have the time. Best,
>> ~DEEN
>> 
>> 
>> On 2/23/11 11:18 AM, Matt Munley wrote:
>> 
>>> Cornelius,
>>> How well would something like 140kit (http://140kit.com/) meet your
>>> needs?
>>> Here's their description from their site:
>>> 
>>> "We use our cluster of machines to collect your data using our access to
>>> the
>>> Twitter API. If you search for tweets with a term, we employ the streaming
>>> API to collect data in a distributed fashion. When your data collection is
>>> finished, depending on your access level, we conduct an array of analytics
>>> on the data set, ranging from the ordinary dump of data in MySQL/CSV to
>>> Network graph visualizations, gender breakdowns, and more."
>>> 
>>> It seems to hit most of your bullet points; though I can't speak to their
>>> stability or long-term viability.
>>> 
>>> 
>>> Matt
>>> 
>>> On Wed, Feb 23, 2011 at 12:04 PM, Cornelius Puschmann<
>>> cornelius.puschmann at uni-duesseldorf.de>  wrote:
>>> 
>>> *Note:* I've also blogged this (in case links in the post don't work) and
>>>> will list all alternatives suggested to me in that blog post:
>>>> http://blog.ynada.com/616
>>>> 
>>>> Dear all,
>>>> 
>>>> A few days ago, the people behind Twitter archival site
>>>> TwapperKeeper.com<http://twapperkeeper.com/>  announced
>>>> that they will be discontinuing the export feature of the service on
>>>> March
>>>> 20, 2011<
>>>> 
>>>> http://twapperkeeper.wordpress.com/2011/02/22/removal-of-export-and-download-api-capabilities/
>>>> 
>>>>> .
>>>>> 
>>>> Apparently the feature is in violation of Twitter’s terms of service, at
>>>> least in the form it’s currently implemented in TwapperKeeper.
>>>> 
>>>> Unfortunately this cuts off a number of academics who are investigating
>>>> communication on Twitter for scientific purposes from a convenient data
>>>> source. While it’s fairly easy to get data directly via the Twitter
>>>> API<http://apiwiki.twitter.com/>  (which
>>>> is what TwapperKeeper was doing), I know many people who want to
>>>> concentrate
>>>> on the data itself, rather than running their own servers to scrape
>>>> Twitter
>>>> on a regular basis. What’s more is that Twitter’s attitude is worrisome:
>>>> many of us have tried to get an exemption from API rate limits in the
>>>> past,
>>>> to no avail. Twitter doesn’t give researchers privileged access to their
>>>> data, and now they’re crippling TwapperKeeper on top of that.
>>>> 
>>>> Bottom line: what will we use after March 20? Ideally, a replacement
>>>> would
>>>> provide the following:
>>>> 
>>>>  - the hashtag/search query functionality of TwapperKeeper,
>>>>  - the export functionality of TwapperKeeper,
>>>>  - exclusive use for academic purposes (on the grounds that this might
>>>>  keep Twitter from shutting it down),
>>>>  - stability and reliability,
>>>>  - long-term viability.
>>>> 
>>>> The last point is important, because I don’t think it will be difficult
>>>> to
>>>> set up a server somewhere to suit the needs of a few people, but a
>>>> larger-scale solution seems more sensible in the long run. Maybe
>>>> JISC<http://www.jisc.ac.uk/>  can
>>>> do something like that, based
>>>> onyourTwapperKeeper<http://code.google.com/p/yourtwapperkeeper/>
>>>> (which they supported<
>>>> 
>>>> http://twapperkeeper.wordpress.com/2010/04/16/jisc-funded-developments-to-twapper-keeper/
>>>> 
>>>>> )?
>>>>> 
>>>> Or one of the big institutes (OII, Berkman)? Either way it would be nice
>>>> to
>>>> find an alternative that doesn’t give those of us with devs and major IT
>>>> support behind them a huge edge over the rest…
>>>> 
>>>> Thanks in advance for your comments,
>>>> 
>>>> Cornelius
>>>> 
>>>> ---
>>>> 
>>>> Dr. Cornelius Puschmann
>>>> Department of English Language and Linguistics
>>>> Heinrich-Heine-University Düsseldorf, Germany
>>>> Junior Researchers Group "Science and the Internet"
>>>> http://nfgwin.uni-duesseldorf.de
>>>> _______________________________________________
>>>> The Air-L at listserv.aoir.org mailing list
>>>> is provided by the Association of Internet Researchers http://aoir.org
>>>> Subscribe, change options or unsubscribe at:
>>>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>>> 
>>>> Join the Association of Internet Researchers:
>>>> http://www.aoir.org/
>>>> 
>>>> _______________________________________________
>>> The Air-L at listserv.aoir.org mailing list
>>> is provided by the Association of Internet Researchers http://aoir.org
>>> Subscribe, change options or unsubscribe at:
>>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>> 
>>> Join the Association of Internet Researchers:
>>> http://www.aoir.org/
>>> 
>> 
>> 
>> --
>> Deen Freelon
>> Ph.D. Candidate, Dept. of Communication
>> University of Washington
>> dfreelon at uw.edu
>> http://dfreelon.org/
>> 
>> 
>> 
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at:
>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> 
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>> 
> 
> 
> 
> -- 
> Nick LaLone
> Texas State University-San Marcos
> Systems Support / Master's Student
> www.nicklalone.com
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/