[Air-L] two new datasets - knowledge graph over academic literature and over human rights reports

kalev leetaru kalev.leetaru5 at gmail.com
Wed Sep 24 10:29:27 PDT 2014


I thought many of you would find of great use two new GDELT Global
Knowledge Graph (GKG) datasets released late yesterday.  The first is the
set of underlying GKG datasets behind our paper that data mined more than
21 billion words of academic literature from JSTOR, DTIC, CORE, CiteSeerX,
CIA, and the Internet Archive (
http://dlib.org/dlib/september14/leetaru/09leetaru.html).  In the hopes of
seeding new kinds of research that incorporate the cultural knowledge of
the world's academic literature, we are making the GKG datasets behind that
paper available for open research.  NOTE that these do NOT contain the text
of the articles themselves, only the metadata computed from each article,
which includes computed metadata of the references cited in each paper,
allowing applications such as identifying the most cited authors and
institutions relating to specific geographies, topics, and socio-political
groups.  The full GKG dataset collection of around 40GB is available:

http://blog.gdeltproject.org/announcing-the-africa-and-middle-east-global-academic-literature-knowledge-graph-ame-gkg/

We have also released a new Human Rights GKG, which encodes in quantitative
form a cross-section of the world’s public knowledge of human rights issues
across the world, scattered across the hundreds of thousands of textual
reports, calls to action, alerts, field interviews, and other material
published by organizations throughout the globe.  This initial GKG encodes
over 110,000 documents encoding a number of the major international human
rights report archives, offering a computable overview of global human
rights issues over the decades:

http://blog.gdeltproject.org/announcing-the-new-human-rights-global-knowledge-graph-hr-gkg/

The GDELT GKG format encodes lists of social groups, organizations,
locations, major themes, emotions, and a range of other metadata computed
from each document, making it possible to conduct a wide array of studies
that blend spatial, semantic, citation, and network analyses (
http://blog.gdeltproject.org/introducing-gkg-2-0-the-next-generation-of-the-gdelt-global-knowledge-graph/
).

We're very much looking forward to seeing what you all are able to do with
these new GKG collections!  For more information on the GDELT Project more
broadly, see the main site (http://www.gdeltproject.org/) or the blog (
http://blog.gdeltproject.org/).



~Kalev



More information about the Air-L mailing list