[Air-L] on the Wayback Machine (was public/private [part 1 of 2])

Tue Aug 14 07:25:55 PDT 2007

Say I place content on a "publicly accessible" webpage without
> creating any incoming links or notifying anyone. Web crawlers won't
> find it. A search engine won't index it. While on the open and public
> Internet, unless a random URL-generator happens to guess the precise
> address of the page, no one will ever read it. Is this content "fair
> game for researchers"?

I have not read the full thread so please forgive me if I am repeating the
same information.

A web crawler will find you, that's the point. There are a finite number of
IP addresses, 4,294,967,296 (232) , these are what get resolved from a URL.

If you don't want to be crawled create a robot.txt file on your web server
and search engines will skip you.

http://www.robotstxt.org/wc/norobots.html

Martin.