[Air-l] Website/weblog word counts

Jeremy Hunsinger jhuns at vt.edu
Wed May 16 04:58:00 PDT 2007


in unixen

cat * | stripHTML | wc

will give you a wordcount of the pages in the directory without  
html.  there are many ways to do stripHTML depending on your coding  
preference, one of the simplest is just to use regex... though that  
generally assumes good code... so... you might want to use tidy, to  
fix the code first, then.... strip it... heh.


jeremy hunsinger
Information Ethics Fellow, Center for Information Policy Research,  
School of Information Studies, University of Wisconsin-Milwaukee  
(www.cipr.uwm.edu)

wiki.tmttlt.com
www.tmttlt.com

()  ascii ribbon campaign - against html mail
/\                        - against microsoft attachments
http://www.stswiki.org/  sts wiki
http://cfp.learning-inquiry.info/  Learning Inquiry-the journal
http://transdisciplinarystudies.tmttlt.com/  Transdisciplinary  
Studies:the book series






More information about the Air-L mailing list