Tuesday 19 December 2006 8:10:00 pm
The widely used web search engines enable users to search for content on billions of different websites. To do this, web search engines index the content on websites. What kinds of content do web search engines index? In other words: What kind of dataset is the whole internet?
These are some of the major constraints:
In addition to these constraints, there is another problem: having the top position in search results is very important for many websites (especially commercial sites). Therefore, much effort is put into tweaking sites to optimize their ranking position in search engines. But - especially for popular search queries - some people use dirty tricks to try to achieve a top-ranking position. (You may have seen a search engine query where one or more of the top results were not related to your search terms.) Using dirty tricks to improve ranking position is called search engine spamming.
The main task of web search engines is to separate good and bad websites, and then to provide relevant query results that point to the good websites. For example, in a normal web search, the existence of links between websites is an important measure of the quality of a site. The linking relationship is interpreted as a popularity vote: if many websites link to one site, the site probably has valuable content.
Searching in a local context has different challenges than searching the web. You do not have the problems of spam or differing content quality. However there are other challenges regarding the existing data and the users' information needs. Fortunately - as we will see later - the situation data-wise fits perfectly with the users' information needs.
In the next section we will look at the special situations when searching in a local context.