[Air-L] on the Wayback Machine (was public/private [part 1 of 2])
Jeremy Hunsinger
jhuns at vt.edu
Mon Aug 13 09:29:56 PDT 2007
On Aug 13, 2007, at 11:06 AM, Michael Zimmer wrote:
> This has been an interesting discussion, and mention of IA's Wayback
> Machine prompts interesting questions which I'm sure others on the
> list can help answer:
>
> (a) Are there other media forms (current or historical) where
> publishing content means that it is automatically scanned and
> archived by external aggregators (search spiders, Internet Archive,
> etc)? [If I posted a note on "The Wall" at Yale Law School, no one
> routinely takes a snapshot of the wall to keep a permanent record of
> it, right?]
Journals, newpapers, magazines come to mind as archived externally
and internally.
>
> (b) If examples for (a) exist, are typical publishers of said content
> aware that their works are being aggregated and archived in such a
> way?
yes, and they try to get as much profit out of the arrangement as
they can, I think, but alas... it isn't always such an arrangement.
> Would a new user know this? Are they notified?
In the case of Newspapers, there were a few court cases a few years
ago dealing with the NYT archiving and distributing itself online,
but i don't recall anyone complaining about third party distribution
such as through firstsearch or similar tools. I think that there is
now a standard contract in place for much of this in the publishing
industry.
> [My concern here
> is that while many realize that search engines might crawl their
> content, few realize they keep a cached copy, and even fewer realize
> that even deleted content is archived by Wayback Machine]
>
> (c) Also, if examples of (a) exist, what means are provided to
> prevent such automatic archiving? Is it opt-in or opt-out? How
> technically proficient must one be? [Concern here is that even if you
> know about Internet Archive, you have to be proficient with
> robots.txt standards in order to keep them out]
dunno, most organizations seem to want to participate, but only under
the best terms they can get
>
> (d) Given (a), how can someone remove past items from such archives?
> [Wayback Machine will remove all domain-specific content already in
> its archive if you place a robots.txt file to block it going forward]
>
> I guess what I'm wondering is why there seems to be a presumption
> that just because I posted something on a website in 1999 I want it
> to always be accessible. Just because bits don't degrade like paper
> doesn't mean they -must- persist, does it?
no, but shouldn't we preserve as much as we can? I appreciate the
will to destroy, that's fine. But for the people who do not care,
the content that they have contributed constitutes evidence of many
things.
>
> Keep up the good discussion,
> michael
>
More information about the Air-L
mailing list