[Air-L] deep learning global news imagery annotations dataset

Sun Feb 21 12:45:00 PST 2016

Apologies for cross posting. Given the tremendous interest we've been
receiving about our work applying Google's Cloud Vision API deep learning
algorithms to an ever-growing fraction of global news imagery over the last
two months, we've decided to box up the computed tags for all 14.6 million
images that we've processed to date (of which the most recent 3.9 million
include the raw JSON output of the Cloud Vision API that adds a wealth of
additional characteristics about each image including facial landmarks and
the color profile of the image) and make them available as a single
download file totaling 7.1GB compressed / 31GB uncompressed.

This dataset does not contain the images themselves, only the output of
Google's deep learning algorithms applied to each image and the URL of the
image and the URL of the article the image was found in. All of the Cloud
Vision API features (
http://googlecloudplatform.blogspot.com/2016/02/Google-Cloud-Vision-API-enters-beta-open-to-all-to-try.html)
are requested for each image, including object/activity descriptive labels,
logo detection, content-based georeferencing, face detection/landmarks and
facial sentiment, SafeSearch, and OCR text recognition (the most recent 3.9
million images pass language hints to the OCR engine for maximal accuracy -
we've even seen some handwriting recognized to date).

This is a snapshot dataset containing all metadata for images processed
through noon EST today, though the GDELT Visual Global Knowledge Graph is a
live feed and updates every 15 minutes, allowing you to access the same
data for global news imagery on an ongoing basis every 15 minutes going
forward.

The full snapshot dataset is available below:

http://blog.gdeltproject.org/visual-global-knowledge-graph-vgkg-february-2016-snapshot-dataset/

The live feed this dataset was snapshotted from, which updates every 15
minutes, is available:

http://blog.gdeltproject.org/gdelt-visual-knowledge-graph-vgkg-v1-0-available/

Remember that you can combine this with the main Global Knowledge Graph
feed to study patterns in images versus the text of the articles containing
them:

http://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/
http://blog.gdeltproject.org/gcam-reaches-40-dictionaries/

Email me directly if you have any questions!

~Kalev
http://kalevleetaru.com/
http://blog.gdeltproject.org/