[Air-L] Screen Scraping of URLS and WHOIS subject/category mining

Nathan Stolero stolero at gmail.com
Thu Feb 8 23:44:49 PST 2018

Dear AOIR's,

I'm studying the information seeking behavior of adolescents, young adults
and adults. One of subjects I'm investigating, is the difference between
the URLS/Links users choose to use (navigate/browse to, click on, etc.) and
the URLS/Links users tend to avoid (looking at them, deciding not to
navigate/browse/click, using eye-tracking).

As a result, I have a list of all the URLS the user visited during the
experiment and a set of screenshots in which the avoided links are marked
(I don't have the URLS because the user did not click on them, so the
software did not save it). I have a question regarding these two lists:

1) Regarding the list of URLS -
What can be the best way to mine a large lists of URLS for their category?
Let's say - http://www.cnn.com with news/broadcasting/content. I tried
WHOIS domains hoping to find this information, and then create a code that
will mine this line for each link, but could not find something significant.

2) Regarding the screenshots -
Is there a way, maybe using screen scraping, to automatically translate
textual links (clickable headlines, for example) to their URLS? Maybe using
a simple protocol of: a) Scrape the text in a marked area, b) search this
text on google, c) Use the first URL?

I hope I've made my intentions clear and looking forward for wisdom on the
virtual crowd.


Nathan Stolero
Doctoral Student
The Communication Department, The Faculty of Social Science
Tel Aviv University

More information about the Air-L mailing list