[Air-L] on the Wayback Machine (was public/private [part 1 of 2])
Michael Zimmer
michael.zimmer at nyu.edu
Mon Aug 13 09:06:24 PDT 2007
This has been an interesting discussion, and mention of IA's Wayback
Machine prompts interesting questions which I'm sure others on the
list can help answer:
(a) Are there other media forms (current or historical) where
publishing content means that it is automatically scanned and
archived by external aggregators (search spiders, Internet Archive,
etc)? [If I posted a note on "The Wall" at Yale Law School, no one
routinely takes a snapshot of the wall to keep a permanent record of
it, right?]
(b) If examples for (a) exist, are typical publishers of said content
aware that their works are being aggregated and archived in such a
way? Would a new user know this? Are they notified? [My concern here
is that while many realize that search engines might crawl their
content, few realize they keep a cached copy, and even fewer realize
that even deleted content is archived by Wayback Machine]
(c) Also, if examples of (a) exist, what means are provided to
prevent such automatic archiving? Is it opt-in or opt-out? How
technically proficient must one be? [Concern here is that even if you
know about Internet Archive, you have to be proficient with
robots.txt standards in order to keep them out]
(d) Given (a), how can someone remove past items from such archives?
[Wayback Machine will remove all domain-specific content already in
its archive if you place a robots.txt file to block it going forward]
I guess what I'm wondering is why there seems to be a presumption
that just because I posted something on a website in 1999 I want it
to always be accessible. Just because bits don't degrade like paper
doesn't mean they -must- persist, does it?
Keep up the good discussion,
michael
-----
Michael Zimmer, PhD
Microsoft Fellow, Information Society Project, Yale Law School
e: michael.zimmer at nyu.edu
w: http://michaelzimmer.org
On Aug 13, 2007, at 11:36 AM, Lois Ann Scheidt wrote:
> And don't forget archiving, that a publicly accessible webpage is
> likely to be archived in the Internet Archive
> (http://www.archive.org/index.php) or as some of us old Saturday
> Morning Cartoon watchers like to call it...The Wayback Machine.
>
> Lois Ann Scheidt
>
> Doctoral Student - School of Library and Information Science, Indiana
> University, Bloomington IN USA
>
> Adjunct Instructor - School of Informatics, IUPUI, Indianapolis IN
> USA and
> IUPUC, Columbus IN USA
>
> Webpage: http://www.loisscheidt.com
> Blog: http://www.professional-lurker.com
>
>
> Quoting Jeremy Hunsinger <jhuns at vt.edu>:
>
>> I would advise you to remove your blogs then because it is very
>> likely that if it is linked to anywhere or hosted on a major blogging
>> platform that it is in one of the research compediums of blogs. if
>> we can find it through google blogsearch or technorati, then it is
>> likely it is in one or more research collections.
>>
>> it is not that you are putting up a window... it is that you are
>> sending out broadsheets and posters on the fence, on the side of your
>> house, probably into public mailboxes, etc. etc.. i don't have to
>> look into the window to see what you've done, i can take photos from
>> the street, comment on the architecture, etc. If i
>>
>> a disclaimer won't really solve your issue either, it might be
>> respected, but only if you do it in a machine readable way. a
>> robot.txt file excluding all search engines will go much farther than
>> a disclaimer.
>
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://
> listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
More information about the Air-L
mailing list