[Air-L] "Big Data" Tools

יוחנן ועקנין yohanan.ouaknine at ois.co.il
Fri May 1 08:15:42 PDT 2015


Hello All,

I'm looking for any previous experience about using Discovertext with
Hebrew Texts.


Thank you,

Yohanan Ouaknine
PhD Candidate

Bar Ilan University

On Tue, Apr 21, 2015 at 1:03 AM, Shulman, Stu <stu at texifter.com> wrote:

> DiscoverText handles non-English well. We have tested our search,
> classifiers, and automated clustering extensively in Chinese and Arabic.
> Our users have studied a wide variety of languages. So long as the data is
> in a Unicode character set you can search, filter, code, cluster, and
> machine classify it. Asmina's group had as many as 10 languages in play
> during her project.
>
> http://discovertext.com/customer-testimonials/
>
> In terms of running queries against Twitter, we have had good luck in many
> languages.
>
> On Wed, Apr 15, 2015 at 5:06 PM, kalev leetaru <kalev.leetaru5 at gmail.com>
> wrote:
>
> > One of the biggest issues that I see on a daily basis in the policy world
> > is that the vast majority of "big data" work (and even "little data"
> work)
> > are based primarily or exclusively on English-language and/or Western
> data
> > sources and attempt to use such sources to make arguments about current
> > events, narratives, and emotions in the non-English non-Western world.
> > There are simply far more tools available for performing analysis of
> > English material than there are for Swahili, for example, or even Arabic,
> > and bilingualism is not as prevalent in many areas of study, so I end up
> > seeing an incredible number of studies based on English-language content
> > about non-English speaking areas of the world.  Similarly, Twitter has
> > become the go-to dataset for social media studies even as Facebook,
> Weibo,
> > VK, Viber, WhatsApp, etc, offer better access to certain communities or
> > modalities, but don't offer the same easy firehose API and tool
> ecosystem,
> > so researchers go with the easier path rather than focusing on which
> > platform might offer the best access to the the community or phenomena
> they
> > are trying to measure.
> >
> > This is something that needs a great deal more attention in the
> > quantitative and "big data" spaces.  Two of my Foreign Policy columns on
> > this topic may be of interest re just how much our understanding of the
> > world is skewed through this fixation on English Western sources.  My
> most
> > recent one, out this afternoon, explores how our understanding of global
> > terrorism trends is based almost exclusively on English-language news
> > coverage and how that has influenced our understanding of trends:
> >
> >
> >
> http://foreignpolicy.com/2015/04/15/why-we-cant-just-read-english-newspapers-to-understand-terrorism-big-data/
> >
> >
> >
> http://www.foreignpolicy.com/articles/2014/09/26/why_big_data_missed_the_early_warning_signs_of_ebola
> >
> >
> > ~K
> >
> >
> >
> > L [mailto:air-l-bounces at listserv.aoir.org] On Behalf Of Matthew Weber
> > > Sent: Thursday, April 09, 2015 11:08 PM
> > > To: air-l at listserv.aoir.org
> > > Subject: [Air-L] "Big Data" Tools
> > >
> > > AIR’ers:
> > >
> > > I’m working on compiling a rough list of tools and training modules
> that
> > > are useful for working with large-scale datasets (“Big Data”) and
> > training.
> > > Essentially, I’m trying to build *something* that I can point newbies /
> > > graduate students / to when they say “I want to do Big Data”. I’ve got
> a
> > > rough list of coursera / edX / blog modules, but would welcome
> > suggestions.
> > > I’m happy to share back the results.
> > >
> > > (I did try to check the AIR archive, but was unable to access).
> > >
> > > Thanks!
> > > Matt
> > >
> > >
> > >
> > >
> > > Matthew S. Weber
> > > Assistant Professor
> > > School of Communication and Information
> > > Rutgers University
> > >
> > > (ph): 848-932-8718
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > The Air-L at listserv.aoir.org mailing list is provided by the
> Association
> > > of Internet Researchers http://aoir.org Subscribe, change options or
> > > unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >
> > > Join the Association of Internet Researchers:
> > > http://www.aoir.org/
> > > _______________________________________________
> > > The Air-L at listserv.aoir.org mailing list
> > > is provided by the Association of Internet Researchers http://aoir.org
> > > Subscribe, change options or unsubscribe at:
> > > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >
> > > Join the Association of Internet Researchers:
> > > http://www.aoir.org/
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> >
>
>
>
> --
> Dr. Stuart W. Shulman
> http://people.umass.edu/stu
>
> Founder and CEO, Texifter
> http://texifter.com
>
> LinkedIn
> http://www.linkedin.com/in/stuartwshulman
>
> Twitter
> https://twitter.com/StuartWShulman
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/

-- 
Yohanan Ouaknine 

<http://www.twitter.com/yohananouaknine>   
<http://www.linkedin.com/in/yohananouaknine>  

 

This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete 
this e-mail and any attachments.

 

*P* *Think of the environmental impact before printing this page *


More information about the Air-L mailing list