[Air-L] "Big Data" Tools

VJ Um Amel laila at vjumamel.com
Sun Apr 19 13:15:09 PDT 2015


Thanks for bringing up this issue. I have mentioned this several times in my research regarding the Arab uprisings. When eighty to ninety-nine percent of all social media content on social movements in the Middle East is in Arabic, it is clear that we must conduct our research in that language. However, as you mentioned, there is a lack of tools, access, and overall research.

My doctoral work included building the R-Shief media system (http://r-shief.org) that has archived and analyzed 18 billion posts over five years in over seventy languages with a specific emphasis on Arabic (http://kal3a.r-shief.org/search). We started collecting tweets by hashtags in Arabic as soon as Twitter made that functional in March 2012 (http://r-shief.org/historical-archive/). And we have also built an open source Arabic Text Analyzer (http://r-shief.org/tools/arabic-entity-extraction/), and conducted semantic and sentiment analysis in Arabic. Our work and tools have only touched the surface (http://r-shief.org/tools/). There is lot more to be done in open source software localization in non-Western, non-English languages.


---
Laila Shereen Sakr </VJ Um Amel>
PhD in Media Arts and Practice
USC School of Cinematic Arts
http://vjumamel.com
http://r-shief.org
+1-202-462-6242



On Apr 15, 2015, at 2:06 PM, kalev leetaru <kalev.leetaru5 at gmail.com> wrote:

> One of the biggest issues that I see on a daily basis in the policy world
> is that the vast majority of "big data" work (and even "little data" work)
> are based primarily or exclusively on English-language and/or Western data
> sources and attempt to use such sources to make arguments about current
> events, narratives, and emotions in the non-English non-Western world.
> There are simply far more tools available for performing analysis of
> English material than there are for Swahili, for example, or even Arabic,
> and bilingualism is not as prevalent in many areas of study, so I end up
> seeing an incredible number of studies based on English-language content
> about non-English speaking areas of the world.  Similarly, Twitter has
> become the go-to dataset for social media studies even as Facebook, Weibo,
> VK, Viber, WhatsApp, etc, offer better access to certain communities or
> modalities, but don't offer the same easy firehose API and tool ecosystem,
> so researchers go with the easier path rather than focusing on which
> platform might offer the best access to the the community or phenomena they
> are trying to measure.
> 
> This is something that needs a great deal more attention in the
> quantitative and "big data" spaces.  Two of my Foreign Policy columns on
> this topic may be of interest re just how much our understanding of the
> world is skewed through this fixation on English Western sources.  My most
> recent one, out this afternoon, explores how our understanding of global
> terrorism trends is based almost exclusively on English-language news
> coverage and how that has influenced our understanding of trends:
> 
> http://foreignpolicy.com/2015/04/15/why-we-cant-just-read-english-newspapers-to-understand-terrorism-big-data/
> 
> http://www.foreignpolicy.com/articles/2014/09/26/why_big_data_missed_the_early_warning_signs_of_ebola
> 
> 
> ~K
> 
> 
> 
> L [mailto:air-l-bounces at listserv.aoir.org] On Behalf Of Matthew Weber
>> Sent: Thursday, April 09, 2015 11:08 PM
>> To: air-l at listserv.aoir.org
>> Subject: [Air-L] "Big Data" Tools
>> 
>> AIR’ers:
>> 
>> I’m working on compiling a rough list of tools and training modules that
>> are useful for working with large-scale datasets (“Big Data”) and training.
>> Essentially, I’m trying to build *something* that I can point newbies /
>> graduate students / to when they say “I want to do Big Data”. I’ve got a
>> rough list of coursera / edX / blog modules, but would welcome suggestions.
>> I’m happy to share back the results.
>> 
>> (I did try to check the AIR archive, but was unable to access).
>> 
>> Thanks!
>> Matt
>> 
>> 
>> 
>> 
>> Matthew S. Weber
>> Assistant Professor
>> School of Communication and Information
>> Rutgers University
>> 
>> (ph): 848-932-8718
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list is provided by the Association
>> of Internet Researchers http://aoir.org Subscribe, change options or
>> unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> 
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at:
>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> 
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list