How the dragons work: searching in a web
Witten, I. H. (2013). How the dragons work: searching in a web. In Proceedings of the 2006 International Workshop on Research issues in digital libraries, 12th-15th December, 2006, Kolkata. New York, USA: ACM.
Permanent Research Commons link: http://hdl.handle.net/10289/8011
Search engines -- "web dragons" -- are the portals through which we access society's treasure trove of information. They do not publish the algorithms they use to sort and filter information, yet how they work is one of the most important questions of our time. Google's PageRank is a way of measuring the prestige of each web page in terms of who links to it: it reflects the experience of a surfer condemned to click randomly around the web forever. The HITS technique distinguishes "hubs" that point to reputable sources from "authorities," the sources themselves. This helps differentiate communities on the web, which in turn can tease out alternative interpretations of ambiguous query terms. RankNet uses machine learning techniques to rank documents by predicting relevance judgments based on training data. This article explains in non-technical terms how the dragons work.