[Air-L] Academic replacements for TwapperKeeper.com?

Nick Lalone nick.lalone at gmail.com
Wed Feb 23 12:39:22 PST 2011


I have been watching the Library of Congress for news on their twitter
archiving. Have they stated when the Twitter archives will become
available?

On Wed, Feb 23, 2011 at 1:37 PM, Deen Freelon <dfreelon at u.washington.edu>wrote:

> I would also be curious to know what others have been using or plan to use
> for harvesting Twitter data. I've used both TwapperKeeper and 140kit, and
> found that the latter is quite good for hashtag archiving, but not as good
> at keyword archiving. Further, 140kit has a max scrape time of one week,
> although that is manually renewable I believe. Finally, both TK and 140kit
> can be quite slow and even unavailable at times, and as we've just seen they
> may shut down at any time.
>
> All of this has made me quite wary of relying on externally managed
> "clouds" for data collection. That is why I intend to set up my own Twitter
> harvesting operation for use within my own department, as many CS
> researchers do, and would encourage others with the necessary means and
> knowledge to do the same. Much valuable data can be collected even within
> the default API query limits, though I'll certainly ask Twitter to put me on
> the whitelist. Running one's own archiving operation is fairly cheap, and
> since you're only archiving your own data, you aren't hamstrung by hundreds
> of other jobs running simultaneously.
>
> If there's any interest in learning how to set up small-scale Twitter
> scrapes, let me know and I'll write something up when I have the time. Best,
> ~DEEN
>
>
> On 2/23/11 11:18 AM, Matt Munley wrote:
>
>> Cornelius,
>> How well would something like 140kit (http://140kit.com/) meet your
>> needs?
>> Here's their description from their site:
>>
>> "We use our cluster of machines to collect your data using our access to
>> the
>> Twitter API. If you search for tweets with a term, we employ the streaming
>> API to collect data in a distributed fashion. When your data collection is
>> finished, depending on your access level, we conduct an array of analytics
>> on the data set, ranging from the ordinary dump of data in MySQL/CSV to
>> Network graph visualizations, gender breakdowns, and more."
>>
>> It seems to hit most of your bullet points; though I can't speak to their
>> stability or long-term viability.
>>
>>
>> Matt
>>
>> On Wed, Feb 23, 2011 at 12:04 PM, Cornelius Puschmann<
>> cornelius.puschmann at uni-duesseldorf.de>  wrote:
>>
>>  *Note:* I've also blogged this (in case links in the post don't work) and
>>> will list all alternatives suggested to me in that blog post:
>>> http://blog.ynada.com/616
>>>
>>> Dear all,
>>>
>>> A few days ago, the people behind Twitter archival site
>>> TwapperKeeper.com<http://twapperkeeper.com/>  announced
>>> that they will be discontinuing the export feature of the service on
>>> March
>>> 20, 2011<
>>>
>>> http://twapperkeeper.wordpress.com/2011/02/22/removal-of-export-and-download-api-capabilities/
>>>
>>>> .
>>>>
>>> Apparently the feature is in violation of Twitter’s terms of service, at
>>> least in the form it’s currently implemented in TwapperKeeper.
>>>
>>> Unfortunately this cuts off a number of academics who are investigating
>>> communication on Twitter for scientific purposes from a convenient data
>>> source. While it’s fairly easy to get data directly via the Twitter
>>> API<http://apiwiki.twitter.com/>  (which
>>> is what TwapperKeeper was doing), I know many people who want to
>>> concentrate
>>> on the data itself, rather than running their own servers to scrape
>>> Twitter
>>> on a regular basis. What’s more is that Twitter’s attitude is worrisome:
>>> many of us have tried to get an exemption from API rate limits in the
>>> past,
>>> to no avail. Twitter doesn’t give researchers privileged access to their
>>> data, and now they’re crippling TwapperKeeper on top of that.
>>>
>>> Bottom line: what will we use after March 20? Ideally, a replacement
>>> would
>>> provide the following:
>>>
>>>   - the hashtag/search query functionality of TwapperKeeper,
>>>   - the export functionality of TwapperKeeper,
>>>   - exclusive use for academic purposes (on the grounds that this might
>>>   keep Twitter from shutting it down),
>>>   - stability and reliability,
>>>   - long-term viability.
>>>
>>> The last point is important, because I don’t think it will be difficult
>>> to
>>> set up a server somewhere to suit the needs of a few people, but a
>>> larger-scale solution seems more sensible in the long run. Maybe
>>> JISC<http://www.jisc.ac.uk/>  can
>>> do something like that, based
>>> onyourTwapperKeeper<http://code.google.com/p/yourtwapperkeeper/>
>>>  (which they supported<
>>>
>>> http://twapperkeeper.wordpress.com/2010/04/16/jisc-funded-developments-to-twapper-keeper/
>>>
>>>> )?
>>>>
>>> Or one of the big institutes (OII, Berkman)? Either way it would be nice
>>> to
>>> find an alternative that doesn’t give those of us with devs and major IT
>>> support behind them a huge edge over the rest…
>>>
>>> Thanks in advance for your comments,
>>>
>>> Cornelius
>>>
>>> ---
>>>
>>> Dr. Cornelius Puschmann
>>> Department of English Language and Linguistics
>>> Heinrich-Heine-University Düsseldorf, Germany
>>> Junior Researchers Group "Science and the Internet"
>>> http://nfgwin.uni-duesseldorf.de
>>> _______________________________________________
>>> The Air-L at listserv.aoir.org mailing list
>>> is provided by the Association of Internet Researchers http://aoir.org
>>> Subscribe, change options or unsubscribe at:
>>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>>
>>> Join the Association of Internet Researchers:
>>> http://www.aoir.org/
>>>
>>>  _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at:
>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>>
>
>
> --
> Deen Freelon
> Ph.D. Candidate, Dept. of Communication
> University of Washington
> dfreelon at uw.edu
> http://dfreelon.org/
>
>
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



-- 
Nick LaLone
Texas State University-San Marcos
Systems Support / Master's Student
www.nicklalone.com



More information about the Air-L mailing list