[Air-L] "Big Data" Tools

Bobo the.bobo at gmail.com
Sun Apr 19 13:37:23 PDT 2015


R-Shief looks very cool, not just technologically but also in the
commitment to open and free access with a non-profit funding model. That
must be so much work! To rip off Gandhi, perhaps digital humanists should
follow suite more generally and "code the change [we] want to see?"

A couple questions:

1) What's the reason why non-Western languages are harder to scrape from
the Twitter API? Does it just not serve double-byte characters (e.g.
Chinese characters) well?

2) Curiosity question - localization is hard to do. Has any work gone on in
automating archival material translation through something like Duolingo
<http://www.duolingo.com>?

Best,
Bobo

On Sun, Apr 19, 2015 at 4:15 PM, VJ Um Amel <laila at vjumamel.com> wrote:

> Thanks for bringing up this issue. I have mentioned this several times in
> my research regarding the Arab uprisings. When eighty to ninety-nine
> percent of all social media content on social movements in the Middle East
> is in Arabic, it is clear that we must conduct our research in that
> language. However, as you mentioned, there is a lack of tools, access, and
> overall research.
>
> My doctoral work included building the R-Shief media system (
> http://r-shief.org) that has archived and analyzed 18 billion posts over
> five years in over seventy languages with a specific emphasis on Arabic (
> http://kal3a.r-shief.org/search). We started collecting tweets by
> hashtags in Arabic as soon as Twitter made that functional in March 2012 (
> http://r-shief.org/historical-archive/). And we have also built an open
> source Arabic Text Analyzer (
> http://r-shief.org/tools/arabic-entity-extraction/), and conducted
> semantic and sentiment analysis in Arabic. Our work and tools have only
> touched the surface (http://r-shief.org/tools/). There is lot more to be
> done in open source software localization in non-Western, non-English
> languages.
>
>
> ---
> Laila Shereen Sakr </VJ Um Amel>
> PhD in Media Arts and Practice
> USC School of Cinematic Arts
> http://vjumamel.com
> http://r-shief.org
> +1-202-462-6242
>
>
>
> On Apr 15, 2015, at 2:06 PM, kalev leetaru <kalev.leetaru5 at gmail.com>
> wrote:
>
> > One of the biggest issues that I see on a daily basis in the policy world
> > is that the vast majority of "big data" work (and even "little data"
> work)
> > are based primarily or exclusively on English-language and/or Western
> data
> > sources and attempt to use such sources to make arguments about current
> > events, narratives, and emotions in the non-English non-Western world.
> > There are simply far more tools available for performing analysis of
> > English material than there are for Swahili, for example, or even Arabic,
> > and bilingualism is not as prevalent in many areas of study, so I end up
> > seeing an incredible number of studies based on English-language content
> > about non-English speaking areas of the world.  Similarly, Twitter has
> > become the go-to dataset for social media studies even as Facebook,
> Weibo,
> > VK, Viber, WhatsApp, etc, offer better access to certain communities or
> > modalities, but don't offer the same easy firehose API and tool
> ecosystem,
> > so researchers go with the easier path rather than focusing on which
> > platform might offer the best access to the the community or phenomena
> they
> > are trying to measure.
> >
> > This is something that needs a great deal more attention in the
> > quantitative and "big data" spaces.  Two of my Foreign Policy columns on
> > this topic may be of interest re just how much our understanding of the
> > world is skewed through this fixation on English Western sources.  My
> most
> > recent one, out this afternoon, explores how our understanding of global
> > terrorism trends is based almost exclusively on English-language news
> > coverage and how that has influenced our understanding of trends:
> >
> >
> http://foreignpolicy.com/2015/04/15/why-we-cant-just-read-english-newspapers-to-understand-terrorism-big-data/
> >
> >
> http://www.foreignpolicy.com/articles/2014/09/26/why_big_data_missed_the_early_warning_signs_of_ebola
> >
> >
> > ~K
> >
> >
> >
> > L [mailto:air-l-bounces at listserv.aoir.org] On Behalf Of Matthew Weber
> >> Sent: Thursday, April 09, 2015 11:08 PM
> >> To: air-l at listserv.aoir.org
> >> Subject: [Air-L] "Big Data" Tools
> >>
> >> AIR’ers:
> >>
> >> I’m working on compiling a rough list of tools and training modules that
> >> are useful for working with large-scale datasets (“Big Data”) and
> training.
> >> Essentially, I’m trying to build *something* that I can point newbies /
> >> graduate students / to when they say “I want to do Big Data”. I’ve got a
> >> rough list of coursera / edX / blog modules, but would welcome
> suggestions.
> >> I’m happy to share back the results.
> >>
> >> (I did try to check the AIR archive, but was unable to access).
> >>
> >> Thanks!
> >> Matt
> >>
> >>
> >>
> >>
> >> Matthew S. Weber
> >> Assistant Professor
> >> School of Communication and Information
> >> Rutgers University
> >>
> >> (ph): 848-932-8718
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> The Air-L at listserv.aoir.org mailing list is provided by the Association
> >> of Internet Researchers http://aoir.org Subscribe, change options or
> >> unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >>
> >> Join the Association of Internet Researchers:
> >> http://www.aoir.org/
> >> _______________________________________________
> >> The Air-L at listserv.aoir.org mailing list
> >> is provided by the Association of Internet Researchers http://aoir.org
> >> Subscribe, change options or unsubscribe at:
> >> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >>
> >> Join the Association of Internet Researchers:
> >> http://www.aoir.org/
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>


More information about the Air-L mailing list