Share » Learn » eZ Publish » Creating a Search Engine

Creating a Search Engine

Tuesday 19 December 2006 8:10:00 pm

  • Currently 3 out of 5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

The hardest thing when implementing a search engine is that you don't know what the result will be. This is mainly because:

  • Computers and algorithms are not intelligent. They do not understand what a document is about. They just have some statistical data about occurrences of terms.
  • It is not possible to define objectively which documents are relevant for a user. Even with the same query, different documents might be relevant as users have different background knowledge.

Because of this, each evaluation depends on user feedback about the relevance of the result. In general, we ask that users try the search and rate if the result is relevant or not. Based on this relevance feedback we can evaluate the quality of a search algorithms and configurations. (Instructions for helping us with this evaluation are at the end of the article.)

Comparing algorithms

A classical approach is based on measuring precision and recall:

  • Precision measures the quality in the returned objects. It is the fraction of the number of retrieved and relevant documents to the number of all retrieved documents. For example, if you get 60 results but only 15 of them are relevant, you have a precision of 15 / 60 = 0.25.
  • Recall measures how many of all relevant results in the whole dataset are found. It is the fraction of retrieved and relevant documents to all relevant documents. If there are 25 relevant documents in total but only 15 of them are returned (of 60 returned results) the recall is 15 / 25 = 0.6.

In general the precision is better if there are only a few documents returned. But the recall is not that good in that situation. The reverse is also true: if your search engine returns many results, the recall will get better with a worse precision value.

Assessing these values requires a large human effort to decide which documents are relevant. Especially for recall, you must look at the whole dataset - not only the retrieved documents - to find all relevant documents.

A more pragmatic approach is called pairwise accuracy. In this approach users only rate the top results and decide how relevant each document is (for example "relevant", "partially relevant" or "not relevant"). Now you look at each pair of documents where the user rated the first document better than the second one. (Pairs of documents that are rated in the same category are not accounted.) Now you calculate the fraction of pairs, where the algorithm ranks the first document higher than the second one, also.

Of course this approach assumes that you have retrieved the most relevant documents in the top results - a highly relevant result in a bottom-ranking position will probably never be rated by the user. But as we mix the results of many algorithms we assume that the most relevant documents are found in the top results of at least one algorithm.

Printable

Printer Friendly version of the full article on one page with plain styles

Author(s)