Share » Learn » eZ Publish » Creating a Search Engine

Creating a Search Engine

Tuesday 19 December 2006 8:10:00 pm

  • Currently 3 out of 5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

General conditions

In contrast to the web, content in a local context is usually focused on one topic. Often the content is of a higher quality than that on an average website.

For example, let's look at ez.no. The website contains content about eZ Systems and its products. While there is a range of content subtopics, all the information is related to eZ. Information generated by a company generally adheres to an underlying quality assurance process where each page has a specific purpose and content is not duplicated. The ez.no site also has content created by users, mainly in the forums. But in contrast to the global web, there is no serious spam problem in the forums (as there is no incentive).

Structured content

Besides the general differences between indexing all the sites on the web and indexing a single site, there are some special considerations if you run a site on eZ Publish. For one thing, valuable metadata like language, author, publishing date, etc. is available. But there is something much more valuable: content objects based on content classes that consist of attributes, which organize the site content into a well-defined schema.

With eZ Publish, we know two things about content object attributes:

  • Attribute names provide a semantic meaning. For example, a text line attribute called "street" contains some localization information.
  • The datatypes provide special information about the stored data. You can have special search operators for specific datatypes. For example, if you have an attribute of type "integer", you could search for a range from "72" to "83". Alternatively, imagine an operator based on localization information that searches for towns around "Oslo".

Of course datatype definition is not only restricted to the attribute level. For example, if you want to search all email addresses, you can find relevant information in attributes of datatype "email". But email addresses are stored in other attributes as well, for example the attribute of type "authors" (because each author can have an email address).

Additionally, custom XML tags can have a semantic meaning. eZ Publish could be extended so that XML elements are mapped to datatypes as well. In that case, the search for email addresses would also search in relevant XML elements.

The ability to combine the semantic and datatype information is very powerful. For example, perhaps you want to search for maximum temperature values of some test series. In this case you would search in all elements (attributes and XML elements) called "maximum temperature", and, as you know they all are defined as floats, you could use the "range" or "less than" operator. In some cases it could even be useful to create your own "temperature" datatype. This datatype would be a subset of the float datatype and could perform the conversion between Celsius, Fahrenheit and Kelvin when searching.

Printable

Printer Friendly version of the full article on one page with plain styles

Author(s)