[Air-L] analysis of how much of the web wayback machine is really archiving

kalev leetaru kalev.leetaru5 at gmail.com
Mon Nov 16 08:58:36 PST 2015


Apologies for cross-posting.  Thought many of you would find of
considerable interest some of the statistics from my new analysis, out this
morning, of what's really in the Internet Archive's Wayback Machine and the
oddities and skew of how its crawlers ingest the web:

http://www.forbes.com/sites/kalevleetaru/2015/11/16/how-much-of-the-internet-does-the-wayback-machine-really-archive/

One of the biggest themes that emerges is the need for greater transparency
and understanding of the algorithms and collection processes of large web
archives and dialog with the scholarly research community around what they
collect and the impacts of those decisions on how and in what ways the
archives can be used for research on the evolution of the web.


~Kalev
http://kalevleetaru.com/
http://blog.gdeltproject.org/



More information about the Air-L mailing list