[Air-L] four visualizations (and open data) from data mining online news at massive scale

Gohar F. Khan gohar.feroz at gmail.com
Mon Feb 27 11:29:36 PST 2017


Interesting stuff Kalev, thanks for sharing it.

Cheers,

GFK

On Tue, 28 Feb 2017 at 6:48 AM kalev leetaru <kalev.leetaru5 at gmail.com>
wrote:

> Apologies for cross-posting - I thought many of you would find of interest
>
> four of my latest pieces on what we can learn about the global structure of
>
> the news media landscape through massive mining of online news, and given
>
> that the underlying datasets are all open, that these might offer great
>
> starting points for many other research questions.
>
>
>
> In particular, two of the analyses rely on applying deep learning image
>
> cataloging to more than a quarter billion global news photographs from last
>
> year, one examining visual geocoding and the other looking at semantic
>
> visual clustering using the assigned labels.
>
>
>
> One explores what it looks like to combine multilingual textual geocoding
>
> and sentiment analysis (both covering 65 languages) to process a quarter
>
> billion news articles and 2.2 billion location mentions to map "global
>
> happiness" as seen through the eyes of the world's online news media.
>
>
>
> The final leverages visual document extraction to compile three quarters of
>
> a billion outlinks from 121 million articles over the last 10 months and
>
> uses that link graph to explore how global media outlets link to each
>
> other. What makes this particular analysis distinct is both the global
>
> scope (crossing all countries and 65 languages) and the use of the article
>
> link graph rather than the page link graph as is traditionally done (ie
>
> looking at only the links in the article text itself, rather than the
>
> myriad links found in the rest of the surrounding page, such as
>
> headers/footers/advertisements/etc).
>
>
>
> Thought these might be of interest re what it looks like to apply these
>
> techniques at scale and with a globalized scope and the open availability
>
> of the underlying computed datasets to enable all kinds of other research
>
> on online news.
>
>
>
>
>
>
> http://www.forbes.com/sites/kalevleetaru/2017/02/27/creating-a-massive-network-visualization-of-the-global-news-landscape-who-links-to-whom/
>
>
>
>
> http://www.forbes.com/sites/kalevleetaru/2017/02/25/what-does-artificial-intelligence-see-in-a-quarter-billion-global-news-photographs/
>
>
>
>
> http://www.forbes.com/sites/kalevleetaru/2017/02/21/visual-geocoding-a-quarter-billion-global-news-photographs-using-googles-deep-learning-api/
>
>
>
>
> http://www.forbes.com/sites/kalevleetaru/2017/02/22/mapping-global-happiness-in-2016-through-a-quarter-billion-news-articles/
>
>
>
>
>
> ~K
>
> http://kalevleetaru.com/
>
> http://blog.gdeltproject.org/
>
> _______________________________________________
>
> The Air-L at listserv.aoir.org mailing list
>
> is provided by the Association of Internet Researchers http://aoir.org
>
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
>
>
> Join the Association of Internet Researchers:
>
> http://www.aoir.org/



More information about the Air-L mailing list