[Air-L] four visualizations (and open data) from data mining online news at massive scale

Michael Herlihy mherlihy at nla.gov.au
Mon Feb 27 16:50:20 PST 2017


Brilliant work - thank you Kalev

-----Original Message-----
From: Air-L [mailto:air-l-bounces at listserv.aoir.org] On Behalf Of Gohar F. Khan
Sent: Tuesday, 28 February 2017 6:30 AM
To: air-l at listserv.aoir.org; kalev leetaru <kalev.leetaru5 at gmail.com>
Subject: Re: [Air-L] four visualizations (and open data) from data mining online news at massive scale

Interesting stuff Kalev, thanks for sharing it.

Cheers,

GFK

On Tue, 28 Feb 2017 at 6:48 AM kalev leetaru <kalev.leetaru5 at gmail.com>
wrote:

> Apologies for cross-posting - I thought many of you would find of 
> interest
>
> four of my latest pieces on what we can learn about the global 
> structure of
>
> the news media landscape through massive mining of online news, and 
> given
>
> that the underlying datasets are all open, that these might offer 
> great
>
> starting points for many other research questions.
>
>
>
> In particular, two of the analyses rely on applying deep learning 
> image
>
> cataloging to more than a quarter billion global news photographs from 
> last
>
> year, one examining visual geocoding and the other looking at semantic
>
> visual clustering using the assigned labels.
>
>
>
> One explores what it looks like to combine multilingual textual 
> geocoding
>
> and sentiment analysis (both covering 65 languages) to process a 
> quarter
>
> billion news articles and 2.2 billion location mentions to map "global
>
> happiness" as seen through the eyes of the world's online news media.
>
>
>
> The final leverages visual document extraction to compile three 
> quarters of
>
> a billion outlinks from 121 million articles over the last 10 months 
> and
>
> uses that link graph to explore how global media outlets link to each
>
> other. What makes this particular analysis distinct is both the global
>
> scope (crossing all countries and 65 languages) and the use of the 
> article
>
> link graph rather than the page link graph as is traditionally done 
> (ie
>
> looking at only the links in the article text itself, rather than the
>
> myriad links found in the rest of the surrounding page, such as
>
> headers/footers/advertisements/etc).
>
>
>
> Thought these might be of interest re what it looks like to apply 
> these
>
> techniques at scale and with a globalized scope and the open 
> availability
>
> of the underlying computed datasets to enable all kinds of other 
> research
>
> on online news.
>
>
>
>
>
>
> http://www.forbes.com/sites/kalevleetaru/2017/02/27/creating-a-massive
> -network-visualization-of-the-global-news-landscape-who-links-to-whom/
>
>
>
>
> http://www.forbes.com/sites/kalevleetaru/2017/02/25/what-does-artifici
> al-intelligence-see-in-a-quarter-billion-global-news-photographs/
>
>
>
>
> http://www.forbes.com/sites/kalevleetaru/2017/02/21/visual-geocoding-a
> -quarter-billion-global-news-photographs-using-googles-deep-learning-a
> pi/
>
>
>
>
> http://www.forbes.com/sites/kalevleetaru/2017/02/22/mapping-global-hap
> piness-in-2016-through-a-quarter-billion-news-articles/
>
>
>
>
>
> ~K
>
> http://kalevleetaru.com/
>
> http://blog.gdeltproject.org/
>
> _______________________________________________
>
> The Air-L at listserv.aoir.org mailing list
>
> is provided by the Association of Internet Researchers http://aoir.org
>
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
>
>
> Join the Association of Internet Researchers:
>
> http://www.aoir.org/
_______________________________________________
The Air-L at listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/


More information about the Air-L mailing list