[Air-L] new hourly global compilation of news homepage links

kalev leetaru kalev.leetaru5 at gmail.com
Fri Mar 2 06:16:00 PST 2018


Thought many of you would find useful - we've just announced today a new
hourly compilation of all the hyperlinks found on the homepages of around
50,000 news websites from across the world, including the order they appear
on each page, released as an hourly TSV file. This can be used for all
kinds of analyses about how news outlets prioritize stories, the kinds of
outlinks they include on their homepages, etc. We combine both static HTML
scanning and Headless Chrome with behavioral scrolling to handle more
complex in-browser dynamically constructed sites.

This is an alpha release, so we're very interested in feedback,
recommendations of news outlets to add, filters to apply, format changes,
etc, so please reach out directly to me with any thoughts as we evolve this
dataset. We're particularly interested in expanding the set of local,
specialty, topical, citizen, partisan, satirical, "fake news" (construed in
the full overused meaning of the phrase) and other kinds of non-traditional
media to capture at ever increasing detail how narratives spread and evolve
in those ecosystems and especially the ways they transit between them.


https://blog.gdeltproject.org/announcing-gdelt-global-frontpage-graph-gfg/


Kalev



More information about the Air-L mailing list