Share » Learn » eZ Publish » Creating a Search Engine

Creating a Search Engine

Tuesday 19 December 2006 8:10:00 pm

  • Currently 3 out of 5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Searching the web

The widely used web search engines enable users to search for content on billions of different websites. To do this, web search engines index the content on websites. What kinds of content do web search engines index? In other words: What kind of dataset is the whole internet?

These are some of the major constraints:

  • Websites deal with a wide range of content topics.
  • For most topics there are many websites, often with similar content.
  • The quality of sites ranges from sites that are well-organized and contain deep information to sites that are not useful at all (for example due to outdated, incomplete or inaccurate information).
  • There is no common semantic markup of website content. Mostly, search engines only see (almost) plain text. The only useful meta information is logical (like text in headings) or visual markup (such as bold or italic text).

In addition to these constraints, there is another problem: having the top position in search results is very important for many websites (especially commercial sites). Therefore, much effort is put into tweaking sites to optimize their ranking position in search engines. But - especially for popular search queries - some people use dirty tricks to try to achieve a top-ranking position. (You may have seen a search engine query where one or more of the top results were not related to your search terms.) Using dirty tricks to improve ranking position is called search engine spamming.

The main task of web search engines is to separate good and bad websites, and then to provide relevant query results that point to the good websites. For example, in a normal web search, the existence of links between websites is an important measure of the quality of a site. The linking relationship is interpreted as a popularity vote: if many websites link to one site, the site probably has valuable content.

Searching in a local context

Searching in a local context has different challenges than searching the web. You do not have the problems of spam or differing content quality. However there are other challenges regarding the existing data and the users' information needs. Fortunately - as we will see later - the situation data-wise fits perfectly with the users' information needs.

In the next section we will look at the special situations when searching in a local context.

Printable

Printer Friendly version of the full article on one page with plain styles

Author(s)