[Air-L] on the Wayback Machine (was public/private [part 1 of 2])

Michael Zimmer michael.zimmer at nyu.edu
Mon Aug 13 09:06:24 PDT 2007


This has been an interesting discussion, and mention of IA's Wayback  
Machine prompts interesting questions which I'm sure others on the  
list can help answer:

(a) Are there other media forms (current or historical) where  
publishing content means that it is automatically scanned and  
archived by external aggregators (search spiders, Internet Archive,  
etc)? [If I posted a note on "The Wall" at Yale Law School, no one  
routinely takes a snapshot of the wall to keep a permanent record of  
it, right?]

(b) If examples for (a) exist, are typical publishers of said content  
aware that their works are being aggregated and archived in such a  
way? Would a new user know this? Are they notified? [My concern here  
is that while many realize that search engines might crawl their  
content, few realize they keep a cached copy, and even fewer realize  
that even deleted content is archived by Wayback Machine]

(c) Also, if examples of (a) exist, what means are provided to  
prevent such automatic archiving? Is it opt-in or opt-out? How  
technically proficient must one be? [Concern here is that even if you  
know about Internet Archive, you have to be proficient with  
robots.txt standards in order to keep them out]

(d) Given (a), how can someone remove past items from such archives?  
[Wayback Machine will remove all domain-specific content already in  
its archive if you place a robots.txt file to block it going forward]

I guess what I'm wondering is why there seems to be a presumption  
that just because I posted something on a website in 1999 I want it  
to always be accessible. Just because bits don't degrade like paper  
doesn't mean they -must- persist, does it?

Keep up the good discussion,
michael


-----
Michael Zimmer, PhD
Microsoft Fellow, Information Society Project, Yale Law School
e: michael.zimmer at nyu.edu
w: http://michaelzimmer.org



On Aug 13, 2007, at 11:36 AM, Lois Ann Scheidt wrote:

> And don't forget archiving, that a publicly accessible webpage is
> likely to be archived in the Internet Archive
> (http://www.archive.org/index.php) or as some of us old Saturday
> Morning Cartoon watchers like to call it...The Wayback Machine.
>
> Lois Ann Scheidt
>
> Doctoral Student - School of Library and Information Science, Indiana
> University, Bloomington IN USA
>
> Adjunct Instructor - School of Informatics, IUPUI, Indianapolis IN  
> USA and
> IUPUC, Columbus IN USA
>
> Webpage:  http://www.loisscheidt.com
> Blog:  http://www.professional-lurker.com
>
>
> Quoting Jeremy Hunsinger <jhuns at vt.edu>:
>
>> I would advise you to remove your blogs then because it is very
>> likely that if it is linked to anywhere or hosted on a major blogging
>> platform that it is in one of the research compediums of blogs.  if
>> we can find it through google blogsearch or technorati, then it is
>> likely it is in one or more research collections.
>>
>> it is not that you are putting up a window...   it is that you are
>> sending out broadsheets and posters on the fence, on the side of your
>> house, probably into public mailboxes, etc. etc..   i don't have to
>> look into the window to see what you've done, i can take photos from
>> the street, comment on the architecture, etc.  If i
>>
>> a disclaimer won't really solve your issue either, it might be
>> respected, but only if you do it in a machine readable way.  a
>> robot.txt file excluding all search engines will go much farther than
>> a disclaimer.
>
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http:// 
> listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/




More information about the Air-L mailing list