[Air-l] Website/weblog word counts
Jeremy Hunsinger
jhuns at vt.edu
Wed May 16 04:58:00 PDT 2007
in unixen
cat * | stripHTML | wc
will give you a wordcount of the pages in the directory without
html. there are many ways to do stripHTML depending on your coding
preference, one of the simplest is just to use regex... though that
generally assumes good code... so... you might want to use tidy, to
fix the code first, then.... strip it... heh.
jeremy hunsinger
Information Ethics Fellow, Center for Information Policy Research,
School of Information Studies, University of Wisconsin-Milwaukee
(www.cipr.uwm.edu)
wiki.tmttlt.com
www.tmttlt.com
() ascii ribbon campaign - against html mail
/\ - against microsoft attachments
http://www.stswiki.org/ sts wiki
http://cfp.learning-inquiry.info/ Learning Inquiry-the journal
http://transdisciplinarystudies.tmttlt.com/ Transdisciplinary
Studies:the book series
More information about the Air-L
mailing list