[Air-L] The Spiders will find you (was wayback machine was public/private)

Charlie Balch charlie at balch.org
Tue Aug 14 09:33:29 PDT 2007


Interesting point about what is accessible on the Internet. I'd not judge
the number of possibilities by the use of IP addresses. It is common
practice to have many websites attached to one IP address and many IP
addresses are used to connect to the internet but do not provide web
content. Even when web content is available at an address, a complete path
is necessary to get to the content. I've often placed content that I'd
prefer the world not see using a web address that has no referring links and
would not easily be guessed.

Search engines follow links that they find on pages. The big engines don't
follow random possible content locations. Yes, there are programs that would
allow a researcher (cracker) to explore all link possibilities on a site.
Such an attempt without permission would be unethical. On the other hand, if
you've announced your content to the world, the world has a right to explore
your content.

I believe that we would all agree that information that a poster has made
some effort to make private through the use of a password or even simple
obscurity requires informed consent before a researcher should be allowed to
us it. On the other hand, publicly presented information should be fair
game. This does bring up an interesting question though. At what point can a
researcher use hidden information? Historians routinely use the content of
diaries and letters that the authors would probably prefer never become
public.

The net is providing a fifth estate. Current USA laws are moving towards
giving bloggers the same protections and responsibilities that are enjoyed
by commercial reporters. Publicly posted that is clearly intended to be read
is fair game and should not require review any more than using a reference
from a journal or popular magazine.

Charlie Balch

-----Original Message-----
From:  elw at stderr.org
Sent: Tuesday, August 14, 2007 8:10 AM



> A web crawler will find you, that's the point. There are a finite 
> number of IP addresses, 4,294,967,296 (232) , these are what get 
> resolved from a URL.

Web crawlers don't typically have much luck crawling by IP address.

Name-based virtual hosting @ the level of the web server tends to make it
less than adequate.

Best practice for virtualhosting is to make a hit directly to an IP address
(rather than a name) return... nothing.

--e
_





More information about the Air-L mailing list