Share » Learn » eZ Publish » Creating a Search Engine

Creating a Search Engine

Tuesday 19 December 2006 8:10:00 pm

  • Currently 3 out of 5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

On a big website or intranet, retrieving relevant content is key to site usability. In addition to a navigation structure based on the document hierarchy, most large sites have an internal search engine that accesses indexed content. In this article we look at the challenges involved with creating an internal search engine. This article is also an overview of the current research at eZ Systems in this area. As part of a research project, I have evaluated different relevance-based ranking concepts and have written a basic test implementation using different approaches.

The existing search engine implementation in eZ Publish has some limitations. The biggest limitation is the lack of relevance-based ranking. You can only sort the results for conditions like class name, time of publishing or URL. The sorting options are limited to those available when fetching content (as described in the documentation). In addition to that drawback, defining search conditions is also limited. For example, it is not possible to define search conditions that match only in headings of an XML field, or to check if a specific custom XML tag exists in an XML field.

First, I'll discuss the main differences between searching in a local context (such as on a website) compared to a normal web search. The differences are based mainly on the different datasets and also on the users' information requirements.

The second part of the article introduces a general model of how search engines work. After that we will go into some detail, looking at the popular and easy-to-understand vector space model (the basis of many commercial search engines).

The last step describes the query language, our test implementation and the evaluation process. We encourage you to evaluate our new ranking approach. Because it is not possible to define relevance in a mathematical formula, your feedback is very important for us, as it allows us to test whether our approaches are successful or not. (Feel free to jump straight to the end of the article for information about trying out our test implementation.)

Printable

Printer Friendly version of the full article on one page with plain styles

Author(s)