[Air-L] on the Wayback Machine (was public/private [part 1 of 2])

Mon Aug 13 09:29:56 PDT 2007

On Aug 13, 2007, at 11:06 AM, Michael Zimmer wrote:

> This has been an interesting discussion, and mention of IA's Wayback
> Machine prompts interesting questions which I'm sure others on the
> list can help answer:
>
> (a) Are there other media forms (current or historical) where
> publishing content means that it is automatically scanned and
> archived by external aggregators (search spiders, Internet Archive,
> etc)? [If I posted a note on "The Wall" at Yale Law School, no one
> routinely takes a snapshot of the wall to keep a permanent record of
> it, right?]

Journals, newpapers, magazines come to mind as archived externally  
and internally.
>
> (b) If examples for (a) exist, are typical publishers of said content
> aware that their works are being aggregated and archived in such a
> way?

yes, and they try to get as much profit out of the arrangement as  
they can, I think, but alas... it isn't always such an arrangement.

> Would a new user know this? Are they notified?

In the case of Newspapers, there were a few court cases a few years  
ago dealing with the NYT archiving and distributing itself online,  
but i don't recall anyone complaining about third party distribution  
such as through firstsearch or similar tools.  I think that there is  
now a standard contract in place for much of this in the publishing  
industry.

> [My concern here
> is that while many realize that search engines might crawl their
> content, few realize they keep a cached copy, and even fewer realize
> that even deleted content is archived by Wayback Machine]

>
> (c) Also, if examples of (a) exist, what means are provided to
> prevent such automatic archiving? Is it opt-in or opt-out? How
> technically proficient must one be? [Concern here is that even if you
> know about Internet Archive, you have to be proficient with
> robots.txt standards in order to keep them out]

dunno, most organizations seem to want to participate, but only under  
the best terms they can get
>
> (d) Given (a), how can someone remove past items from such archives?
> [Wayback Machine will remove all domain-specific content already in
> its archive if you place a robots.txt file to block it going forward]
>
> I guess what I'm wondering is why there seems to be a presumption
> that just because I posted something on a website in 1999 I want it
> to always be accessible. Just because bits don't degrade like paper
> doesn't mean they -must- persist, does it?

no, but shouldn't we preserve as much as we can?   I appreciate the  
will to destroy, that's fine.  But for the people who do not care,  
the content that they have contributed constitutes evidence of many  
things.
>
> Keep up the good discussion,
> michael
>