[Air-L] Weibo Data Analysis

Gilad Lotan giladlotan at gmail.com
Fri Aug 10 13:36:06 PDT 2012


Over the summer I've been working with a summer intern to sample and
analyze data from Weibo. We've been collecting Weibo posts over the past
couple months, sampling the public stream every few seconds, using multiple
API keys coming from various IP addresses (yup, they heavily rate limit API
access).

Some interesting facts about Weibo's API:

- The way we're currently sampling, we're seeing around 4000 posts per
minute
- Weibo doesn't easily provide the friendship/follower graph like Twitter.
It only reveals the last 5k followers for any public account.
- Weibo uses an explicit sentiment/emoticons mechanism which is very
popular. They link an emoticon to an emotion (spelled out). When a user
chooses an icon, it embeds the word that the icon represents, within the
user's post. Its possible to start mining this sentiment by looking at
Weibo posts (emotions are placed within square brackets).
- The API has an up-and-coming feature (according to their docs) which will
give us the ability to know how many of the account's followers are online
at any given time (VERY COOL).

We just published a first analysis from this data, looking at people's
reaction to Olympic hurdler Liu Xiang's epic fail.
http://blog.socialflow.com/post/7120245585/weibo-chinas-twitter-equivalent-abuzz-with-sentimentover-liu-xiangs-olympic-fail

The post shows some of the ways in which we can use Weibo data. We have
this growing corpus of Weibo data. I'd love to collaborate with other folks
(or their students!) who want to explore the data and help us figure out
how we can use it to learn about public sentiment in China. I'm not ready
to make a public call yet, as I don't want Weibo to ban us (or our IP
addresses) from hitting their servers.

Let me know if this sounds interesting, or if there's someone I should talk
to!

-- 
Gilad | @gilgul

thoughts: http://giladlotan.com/blog
activism: http://www.globalvoicesonline.org/author/gilad-lotan/



More information about the Air-L mailing list