Tuesday 19 December 2006 8:10:00 pm
Normal users want to have a simple form where they just input some words. More advanced users want additional options. Then there are developers who want to have the maximum power to create customized and advanced search forms. In all these cases, the system uses a powerful query language in the background. Users enter the query conditions on a simple form; the conditions are transformed into the query language on the back end.
Let's think about what our query language should be able to do.
We talked about the advantages of structured information. Attributes and XML elements have a semantic meaning and are based on a datatype with special comparison operators. We looked at the example of searching for temperature values that could be stored in attributes or (custom) XML tags. Additionally, the temperature values of different measurements could be converted for comparison.
To have this kind of power, you need functionality similar to the following:
In a limited way these options are already available with the fetch() and search() operators in eZ Publish. However, this functionality could be enhanced.
All content objects in eZ Publish are stored as XML (or can easily be mapped to XML). But not only the content objects are XML: you can even imagine the whole content tree of an eZ Publish installation as one big XML tree, based on the content objects' XML representations and the node tree hierarchy.
There is also additional information that web search engines can't use. For example, although this article is split over multiple web pages, we know that it is one unit. You might want to combine the article pages when calculating the ranking value of the whole article. The same may be true if you embed an object in another object or if objects are related.
In the world of XML, XPath is a very general and powerful query language for structured XML documents (like SQL is for relational databases). Here are some XPath examples:
//article
//article[./author[contains( 'fred' )]]
//article[./author[contains( 'fred' )]]//heading
//article[./author[contains( 'fred' )]]/heading[contains( 'license' )]
These are only a few examples of the power of XPath with a really informal explanation of how they work. More information can be found on Wikipedia or on the W3C site.
But XPath can only be used to retrieve XML elements that fulfil some conditions - there is no relevance-based ranking of the retrieved XML sub-trees (in the same way that there is no such ranking in SQL). Therefore we need an approach that extends XPath.
XIRQL (spoken like "circle") enhances XPath by adding some important concepts and functionality for information retrieval. [3] For example, it introduces:
We extended XIRQL with the concepts of filter() and rank() functions:
A XIRQL query could, for example, look like this:
//article[filter(./author[contains( 'fred' )]), rank(./heading[contains( 'licence' )])]
All constraints that are possible in XPath and XIRQL can be defined as filter or ranking conditions.
Ok, enough theory. Let's have a look at our current status.