[Air-l] SW to store webpages

Wed Jun 8 09:24:22 PDT 2005

On Mon, 6 Jun 2005 Gail Tailor wrote:

>I have been following this discussion and have not yet seen anyone raise a
question relating to the legality of using this approach to capture web pages in
support of academic research, in particular those web pages that are clearly
identified as being copyrighted. Are you contacting the owner of the site to ask
permission to create a copy of their documents for current and future use? If
not, how are you justifying using this approach to duplicating the data? Fair
use laws?

Gail--
Here's my $.02 in response to your questions about fair use and copyright
regarding Web pages... My understanding of the U.S. Digital Millenium Copyright
Act is that all Web pages are copyrighted upon posting to the Web (which means
that every instance of use of a Web browser can be argued to be illegal, since
browsers "copy" Web pages as they display them). Clearly some interpretations of
the DMCA are technically untenable. That said, copying Web pages (and storing
the copies) for the purpose of academic research is clearly within the U.S. fair
use doctrine. However, the applicability of the fair use doctrine in
re-presenting copied Web pages on the Web or in print in the context of academic
research has been interpreted in a range of ways by various U.S. universities,
libraries, and academic publishers over the last five+ years-- resulting in a
range of protocols regarding the means (e.g. opt-in or opt-out) and timing (e.g.
before/during collection or before/during display) of notification of site
producers. The Internet Archive (a non-proft organization) is an example of a
liberal interpretation of the DMCA: it has taken the stance that previously
produced Web resources should be preserved and available in the public domain,
and generally operates on a post-display opt-out mechanism (see
http://web.archive.org). In each of the Web collections in which our
WebArchivist.org research group has participated, a different protocol has been
employed in response to the nature of the colllection, and the policies of
participating institutions. I highly commend the precedent that AOIR member
Laura Gurak and Yale University Press set in Laura's book *Cyberliteracy*. A
2-page appendix explains the U.S. fair use doctrine and the rationale underlying
Laura's (and the Press's) decision *not* to seek permissions from site producers
for the screenshots included in this book.

> How are you using the data after it has been captured? Are you extracting data
> >to support point in time studies? Longitudinal studies?

My collaborators and I have employed Web-based data (e.g. archived Web pages)
and metadata (codes and other kinds of annotations associated with Web pages) in
several kinds of analyses, both point-in-time and longitudinal. Some of our
publications about Web-based research are available at
http://webarchivist.org/resources.htm. You might be particularly interested in
our presentation on the "Ethics of Web Archiving" from the 2003 Internet
Research conferece, which is also available via that page.

> I am also using this as an opportunity to advance research practices in a
> >manner that call attention to the dynamic nature of Internet web sites and
> some of the inherrent issues in taking this approach to collecting data in
> comparison to some of the more traditional methods that might be used when
> working with paper documents.

I share your interest in these issues, and would be happy to correspond further
with you about this offlist.

-Kirsten

***************************************
Kirsten A. Foot, PhD
Assistant Professor, Communication
Co-Director, WebArchivist.org
University of Washington
Box 353740
Seattle, WA 98195-3740
206-543-4837
kfoot at u.washington.edu