[Air-L] Seeking solutions to small text search mystery

Charlie Balch charlie at balch.org
Mon Jan 10 20:49:21 PST 2011


The free PDF-XChange viewer might be your answer.  I use this application for grading as it has extensive markup tools. It also has superior find and search capabilities including the ability to create concordances. 

Charlie

Charles V. Balch PhD
Business Faculty
Northern Arizona University - Yuma

-----Original Message-----
From: air-l-bounces at listserv.aoir.org [mailto:air-l-bounces at listserv.aoir.org] On Behalf Of Craig Scott
Sent: Monday, January 10, 2011 1:25 PM
To: air-l at listserv.aoir.org
Subject: [Air-L] Seeking solutions to small text search mystery

 

Colleagues, a graduate student and I could use your help solving a mystery related to computerized text searching/coding of online documents.  We are examining documents (all saved as .pdf files) using the advanced search tool in Adobe Reader. While that tool generally works fine, it does not seem to recognize certain fairly standard statistical/mathematical symbols (such as the p used in statistical significance testing and symbols such as <, >, or
=) in numerous documents.  This is true even when we directly cut and paste the symbol in question into the search tool (surprisingly, it still does not recognize that symbol in the document). The problem occurs only with certain sources (such as all articles from certain journals), even when the rest of the article is fully searchable. This is happening with very recent documents published after 2000 (we are not searching older ones). We suspect these symbols might be part of some equation editor or specially formatted text, but we don't know.

Has anyone else encountered and solved a similar problem? Do you have any other suggestions on a search tool for .pdf documents that might be superior? We would also welcome any suggestions on other ways to save these documents and search them that would address this (I think we could do optical character recognition, but fear that may create other accuracy problems). Thanks for any suggestions/thoughts you have related to helping us solve this frustrating little mystery.

Craig 

Craig R. Scott, Ph.D., 

   Associate Professor, Department of Communication &

   Director, Ph.D. Program

School of Communication & Information

Rutgers University

4 Huntington Street, New Brunswick, NJ 08901

Voice: 732-932-7500 x8142; Fax: 732-932-3756

Office in 201 DeWitt (185 College Avenue)

Web:  <http://comminfo.rutgers.edu/directory/crscott/index.html>
http://comminfo.rutgers.edu/directory/crscott/index.html
<https://www.scils.rutgers.edu/directory/crscott/index.html> 

Linked in:  <http://www.linkedin.com/pub/11/b83/241>
http://www.linkedin.com/pub/11/b83/241

 

_______________________________________________
The Air-L at listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/





More information about the Air-L mailing list