[Air-L] SOTU Data

Mon Feb 7 06:49:09 PST 2011

We have made available Facebook and Twitter raw data from the State of the
Union:

http://discovertext.com/sotu.aspx

The tweets are for the two days leading up to SOTU and the Facebook content
goes back about 6 months.

    * Huffington Post (Twitter, de-duplicated) (14,476 documents, 11 MB)
    * Redstate (Twitter, de-duplicated) (1,571 documents, 1 MB)
    * Sean Hannity (Twitter) (1,170 documents, 930 KB)
    * #SOTU (Twitter, de-duplicated) (53,369 documents, 40 MB)
    * #SOTU (Twitter, full dataset) (109,601 documents, 82 MB)
    * Sarah Palin (mentions on Twitter in #SOTU feeds) (1,271 documents, 927
KB)
    * Whitehouse Official Facebook Page (Facebook) (423,358 documents, 315
MB)
    * Obama (Twitter, de-duplicated) (116,776 documents, 87 MB)
    * Obama (Twitter, full dataset) (222,441 documents, 166 MB)

To what end? The automated classifiers really don't tell anything useful,
thought they do make pretty pictures:

http://blog.texifter.com/index.php/2011/02/03/text-analysis-during-the-2011-state-of-the-union-address/

I guess the hope is that other users will dive into particular slices of the
data and do content analytic or interpretive qualitative work that let's us
know if all the chatter really does matter. Meanwhile, we are working on
some better classifiers in the lab.

~Stu

-- 

Stuart Shulman
President & CEO
Texifter, LLC <http://www.texifter.com/>

Have you tried DiscoverText?
http://discovertext.com
*Featuring the Facebook Graph & Twitter APIs*