[Air-L] Twitter no longer allowing use for scholarship - Update

Amanda Lenhart alenhart at pewinternet.org
Tue Mar 22 12:34:00 PDT 2011


Hi  List,

I heard back from my friend at Twitter - I've pasted in her response to my message below - in it she notes which information is still available through the API, which is available but limited and what information isn't going to be trackable through the API. She also indicates places where we may need to establish a special link to a twitter staffer to work with academics on particular types of requests. She has annotated the bulleted list I provided and then provided a narrative below the numbered list of subsample types that researchers need.



"I'm figuring out who the stakeholders are around here for issues like this, but in the meantime did want to pass along some information about possible workarounds that exist currently.



-geotagging - you can glean geo data from a given tweet through API



-location - yes, same as above, searching by location is limited (see below)



-institutional tweets - you could track all tweets from a given account through API



- at replies - also trackable via API



-hashtags and post keywords - you can search/track by keyword but it's limited (see below)



-collocating a single user's tweets - same as institutional



-RT counts and tracking - yes. RT counts is an iffy API but it is/will be generally available



-words within tweets - same as keyword



-follower/following data - yes, for the instant when you query the API. this data over time isn't returned by us



-number of tweets - that a user has posted? yes



-links - limited to search restrictions (below)



-images/avatars - yes



-lists - yes





Sometimes researchers need subsamples of the twitter stream based around:

1.      A date and time (e.g. all tweets two days before, during and

two days after the Superbowl)

2.      Hashtags or keywords

3.      Individuals (e.g. top ten most active people in a particular

community of practice)

4.      Random subsample of twitter stream



1 & 2 here are usually impossible for researchers to get, the way they want it, through our API, and this would be the main area we'd have to work to provide resources for, I think.



"Historical" searches for keywords or over past date ranges have to be done through the search function or search API; this is pretty limited to a moving 5-7 day window. Nothing's available once it becomes older than that.



Streaming API is much better at returning this kind of targeted data (there's even a "hose" that will just return tweets with links in them, as mentioned in the first list), like tweets with keywords, tweets from this location, tweets from this list of users -- but it's all in real time. You'd have to set up a connection way in advance to make sure you get everything as it happens, and then go back and perform your analysis on the accumulated corpus.



#3 - If you determine yourself who the relevant individuals are, you can totally get their timelines through the API



#4 - available in real time through Streaming API, but not through historical lookup



Let me know if this helps at all in the meantime!"





So, those of you who are still not able to gather the data you like even with what she suggests is still available above, please let me know. I think the key issue seems to be historic access to tweets, which I'm guessing Twitter can't afford to store given the volume of material passing through their servers. It's not clear from Gnip's (twitter's data reseller) website whether they offer historic access, but it doesn't look like they do, either. I suspect that the US Library of Congress, which is apparently working with Twitter and Gnip to archive twitter feeds may end up being the lone keeper of twitter data that is older than a week. However, they are still figuring out all the technical issues, and haven't begun making this information available publicly, and it may be some time before they do.



Hopefully the above is helpful to some of you. But please do let me know (offlist) those of you who will be unable to collect your data after the end of this month, and I'll take that back to Twitter and we'll go from there.



{Update: Since I wrote this and tried unsuccessfully to post this to the list yesterday, it seems like TwapperKeeper will enable some downloading (into Excel) of historic tweets. http://twapperkeeper.wordpress.com/2011/03/22/save-as-excel-feature-has-been-brought-back-online/. And Luca Rosa posted a possible work-around to the list earlier today.}



Thanks,



Amanda



Amanda Lenhart
Senior Research Specialist
Pew Research Center's Internet & American Life Project
alenhart at pewinternet.org<mailto:alenhart at pewinternet.org>
twitter: amanda_lenhart





More information about the Air-L mailing list