[Air-L] Academic replacements for TwapperKeeper.com?

Thu Feb 24 00:40:19 PST 2011

Hi Deen,

I've used both services and they are very good, but you're better off
setting up a server and running a few apps on your own. I'm running a
modified version of yTK through a MAMP server at the University of São Paulo
(99.9% uptime). We've changed yourTwapperKeeper considerably. There are a
few supplementary scripts that map users network after hashtag archiving. It
takes a lot of requests to get it done, but it's been working fine. If you
can get your IP address whitlistened, then you're good to map the network.
For the time being I'd tell you to work on yTK until it does what you need.
Let me know if you hear of any other way to work this out.

[]s
Marco
___________________
Marco Toledo Bastos
Postdoctoral Fellow
FiloCom - ECA / USP
(0055) 11 7102-4756
FAMe - UniFrankfurt
(0049) 151-58768326

> Date: Wed, 23 Feb 2011 11:37:08 -0800
> From: Deen Freelon <dfreelon at u.washington.edu>
> To: air-l at listserv.aoir.org
> Subject: Re: [Air-L] Academic replacements for TwapperKeeper.com?
> Message-ID: <4D6561E3.6040505 at u.washington.edu>
> Content-Type: text/plain; charset=windows-1252; format=flowed
> 
> I would also be curious to know what others have been using or plan to
> use for harvesting Twitter data. I've used both TwapperKeeper and
> 140kit, and found that the latter is quite good for hashtag archiving,
> but not as good at keyword archiving. Further, 140kit has a max scrape
> time of one week, although that is manually renewable I believe.
> Finally, both TK and 140kit can be quite slow and even unavailable at
> times, and as we've just seen they may shut down at any time.
> 
> All of this has made me quite wary of relying on externally managed
> "clouds" for data collection. That is why I intend to set up my own
> Twitter harvesting operation for use within my own department, as many
> CS researchers do, and would encourage others with the necessary means
> and knowledge to do the same. Much valuable data can be collected even
> within the default API query limits, though I'll certainly ask Twitter
> to put me on the whitelist. Running one's own archiving operation is
> fairly cheap, and since you're only archiving your own data, you aren't
> hamstrung by hundreds of other jobs running simultaneously.
> 
> If there's any interest in learning how to set up small-scale Twitter
> scrapes, let me know and I'll write something up when I have the time.
> Best, ~DEEN