[Air-L] four visualizations (and open data) from data mining online news at massive scale

kalev leetaru kalev.leetaru5 at gmail.com
Mon Feb 27 09:47:38 PST 2017


Apologies for cross-posting - I thought many of you would find of interest
four of my latest pieces on what we can learn about the global structure of
the news media landscape through massive mining of online news, and given
that the underlying datasets are all open, that these might offer great
starting points for many other research questions.

In particular, two of the analyses rely on applying deep learning image
cataloging to more than a quarter billion global news photographs from last
year, one examining visual geocoding and the other looking at semantic
visual clustering using the assigned labels.

One explores what it looks like to combine multilingual textual geocoding
and sentiment analysis (both covering 65 languages) to process a quarter
billion news articles and 2.2 billion location mentions to map "global
happiness" as seen through the eyes of the world's online news media.

The final leverages visual document extraction to compile three quarters of
a billion outlinks from 121 million articles over the last 10 months and
uses that link graph to explore how global media outlets link to each
other. What makes this particular analysis distinct is both the global
scope (crossing all countries and 65 languages) and the use of the article
link graph rather than the page link graph as is traditionally done (ie
looking at only the links in the article text itself, rather than the
myriad links found in the rest of the surrounding page, such as
headers/footers/advertisements/etc).

Thought these might be of interest re what it looks like to apply these
techniques at scale and with a globalized scope and the open availability
of the underlying computed datasets to enable all kinds of other research
on online news.


http://www.forbes.com/sites/kalevleetaru/2017/02/27/creating-a-massive-network-visualization-of-the-global-news-landscape-who-links-to-whom/

http://www.forbes.com/sites/kalevleetaru/2017/02/25/what-does-artificial-intelligence-see-in-a-quarter-billion-global-news-photographs/

http://www.forbes.com/sites/kalevleetaru/2017/02/21/visual-geocoding-a-quarter-billion-global-news-photographs-using-googles-deep-learning-api/

http://www.forbes.com/sites/kalevleetaru/2017/02/22/mapping-global-happiness-in-2016-through-a-quarter-billion-news-articles/


~K
http://kalevleetaru.com/
http://blog.gdeltproject.org/



More information about the Air-L mailing list