[Air-L] "Big Data" Tools

Shulman, Stu stu at texifter.com
Mon Apr 20 16:03:34 PDT 2015


DiscoverText handles non-English well. We have tested our search,
classifiers, and automated clustering extensively in Chinese and Arabic.
Our users have studied a wide variety of languages. So long as the data is
in a Unicode character set you can search, filter, code, cluster, and
machine classify it. Asmina's group had as many as 10 languages in play
during her project.

http://discovertext.com/customer-testimonials/

In terms of running queries against Twitter, we have had good luck in many
languages.

On Wed, Apr 15, 2015 at 5:06 PM, kalev leetaru <kalev.leetaru5 at gmail.com>
wrote:

> One of the biggest issues that I see on a daily basis in the policy world
> is that the vast majority of "big data" work (and even "little data" work)
> are based primarily or exclusively on English-language and/or Western data
> sources and attempt to use such sources to make arguments about current
> events, narratives, and emotions in the non-English non-Western world.
> There are simply far more tools available for performing analysis of
> English material than there are for Swahili, for example, or even Arabic,
> and bilingualism is not as prevalent in many areas of study, so I end up
> seeing an incredible number of studies based on English-language content
> about non-English speaking areas of the world.  Similarly, Twitter has
> become the go-to dataset for social media studies even as Facebook, Weibo,
> VK, Viber, WhatsApp, etc, offer better access to certain communities or
> modalities, but don't offer the same easy firehose API and tool ecosystem,
> so researchers go with the easier path rather than focusing on which
> platform might offer the best access to the the community or phenomena they
> are trying to measure.
>
> This is something that needs a great deal more attention in the
> quantitative and "big data" spaces.  Two of my Foreign Policy columns on
> this topic may be of interest re just how much our understanding of the
> world is skewed through this fixation on English Western sources.  My most
> recent one, out this afternoon, explores how our understanding of global
> terrorism trends is based almost exclusively on English-language news
> coverage and how that has influenced our understanding of trends:
>
>
> http://foreignpolicy.com/2015/04/15/why-we-cant-just-read-english-newspapers-to-understand-terrorism-big-data/
>
>
> http://www.foreignpolicy.com/articles/2014/09/26/why_big_data_missed_the_early_warning_signs_of_ebola
>
>
> ~K
>
>
>
> L [mailto:air-l-bounces at listserv.aoir.org] On Behalf Of Matthew Weber
> > Sent: Thursday, April 09, 2015 11:08 PM
> > To: air-l at listserv.aoir.org
> > Subject: [Air-L] "Big Data" Tools
> >
> > AIR’ers:
> >
> > I’m working on compiling a rough list of tools and training modules that
> > are useful for working with large-scale datasets (“Big Data”) and
> training.
> > Essentially, I’m trying to build *something* that I can point newbies /
> > graduate students / to when they say “I want to do Big Data”. I’ve got a
> > rough list of coursera / edX / blog modules, but would welcome
> suggestions.
> > I’m happy to share back the results.
> >
> > (I did try to check the AIR archive, but was unable to access).
> >
> > Thanks!
> > Matt
> >
> >
> >
> >
> > Matthew S. Weber
> > Assistant Professor
> > School of Communication and Information
> > Rutgers University
> >
> > (ph): 848-932-8718
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list is provided by the Association
> > of Internet Researchers http://aoir.org Subscribe, change options or
> > unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



-- 
Dr. Stuart W. Shulman
http://people.umass.edu/stu

Founder and CEO, Texifter
http://texifter.com

LinkedIn
http://www.linkedin.com/in/stuartwshulman

Twitter
https://twitter.com/StuartWShulman


More information about the Air-L mailing list