[Air-L] Q: Twitter text logs?

Cornelius Puschmann cornelius.puschmann at uni-duesseldorf.de
Thu Mar 18 06:34:04 PDT 2010


I'm using a similar methodology as Reynol for a project I am doing (see
http://blog.ynada.com/179).

Here's a very basic step-by-step description. Sorry,I realize that some if
this stuff may be difficult to implement without the right technical
expertise. Also, you need to be on Linux. :-(

1. Set up a special account to follow a group of users you want to study.
This has several advantages (see below). You can see the account by using
search, the public timeline, lists etc. I've seeded
http://twitter.com/scientwists mainly via lists.

2. Use a script (e.g. in Bash, Python or Perl) to retrieve the tweets of the
people you are following via the Twitter API and cURL. Run this script in
regular intervals via a CronJob. I've set up and old laptop running 24/7 to
do this for me and fetch new material once per hour, since Twitter doesn't
reliabily return older data.

3. Use XSLT (e.g. xsltproc) to convert the XML into a nice concordance, for
example CSV.

4. Use something like NLTK or R to extract hashtags, frequent terms, popular
URLs, who is the most retweeted user.... other cool stuff you can think of.

If you can, put everything on Dropbox. That way you get both a backup and
live stats from anywhere you work.

Why use a special account rather than just pulling data via the public
timeline?
a) you can let people know that you plan on using their tweets
b) you can allow them to block you, effectively giving them the chance to
opt out. just because their Twitter is public doesn't mean they want to be
included in your study.
c) they can get in touch with you, give feedback ask questions etc
d) makes longitudinal research easier since you have a live database via the
account
e) really easy to expand/modify the corpus -- student assistants can easily
help you with that whithout having to do any programming

HTH,

Cornelius Puschmann

On Wed, Mar 17, 2010 at 12:56 AM, Reynol Junco <rey.junco at gmail.com> wrote:

> Doron,
>
> I would recommend using the Twitter API directly. Here is the page in the
> Twitter API Wiki that explains how to correctly call on the API for a
> search
> : http://apiwiki.twitter.com/Twitter-Search-API-Method:-search.
>
> When you make such a call into the API, it will return an XML file that can
> be imported into MS Excel (Windows only-- sorry, fellow Mac users). I wrote
> a step-by-step blog post on how to do this using the "user timeline" method
> when we were in the middle of our Twitter research project last semester:
>
> http://reyjunco.tumblr.com/post/219287195/how-to-export-twitter-updates-to-excel
>
> If you want to download a lot of tweets, make sure you pay particular
> attention to the *rpp* (returns per page) and the *page* parameters.
>
> I have a lot of experience with this as we downloaded and archived almost
> 3,500 tweets during our 15-week-long study. So, please let me know if you
> have any questions.
>
> Best,
>
> Rey Junco
>
>
> --
> Dr. Reynol Junco
> Associate Professor
> Department of Academic Development and Counseling
> Director, Disability Services
> Lock Haven University
> http://www.reyjunco.com
>
>
> ------------------------------
>
> Message: 5
> Date: Tue, 16 Mar 2010 22:02:22 +0200
> From: "Friedman Doron" <doronf at idc.ac.il>
> To: <air-l at listserv.aoir.org>
> Subject: [Air-L] Q: Twitter text logs?
> Message-ID: <EAC062C903BCD049AE7C3602A2F373A5AFC1EB at JAMES.idc.ac.il>
> Content-Type: text/plain;       charset="us-ascii"
>
> Hi,
>
>
>
> Can anyone point us to authentic data from twitter that can be used for
> research purposes? We are looking for sub-networks of followers, and a
> set of texts generated by users that belong to the network.
> Alternatively status messages from Facebook with the corresponding
> friendship graphs can be useful as well. of course, the actual usernames
> can be encrypted for privacy. Of course we will give full credit to
> whoever has been able to harvest such data and make it available to the
> research community.
>
>
>
> Thanks!
>
>
>
> - doron
>
>
>
> ====================
>
> Dr. Doron Friedman
>
> Lecturer, The Interdisciplinary Center, Herzliya (Israel) &
>
> Honorary Lecturer, University College London
>
> Mobile: +972-54-4461807
>
> Office: +972-9-9527654
>
> http://www.idc.ac.il/communications/avl
>
> --
> Dr. Reynol Junco
> Associate Professor
> Department of Academic Development and Counseling
> Director, Disability Services
> Lock Haven University
> http://www.reyjunco.com
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



-- 
Dr. des. Cornelius Puschmann, M.A.

Dept for English Language and Linguistics
University of Düsseldorf, Germany
-and-
University Library Center (hbz), Cologne, Germany

http://google.com/profiles/puschmann
http://ynada.com
http://elanguage.net



More information about the Air-L mailing list