Share » Learn » eZ Publish » Helper Child Nodes: User-Friendly...

Helper Child Nodes: User-Friendly Search Results and Preventing Full Node Views

Thursday 15 May 2008 4:11:00 am

  • Currently 5 out of 5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

We will assume that master nodes have only one location. If a master node had multiple locations, each of its locations would have its own set of children and thus multiple locations of such a master node would have different helper nodes. As eZ Publish search algorithms work with objects rather than nodes, it would be impossible to determine which location is the right one.

By default, eZ Publish shows search results using the line view, so we will also assume this in our examples.

If we search for content stored in helper child nodes, we will see these helper nodes in the list of search results. By default, these results link to the full views of these nodes. We will show the link to the parent node instead by using the following code in an override of the node/view/line.tpl template for helper node classes:

<a href={$node.parent.url|ezurl}>{$node.parent.name|wash}</a>

We could alternatively use this:

{node_view_gui content_node=$node.parent view='line'}

However, if a search term exists in more than one helper node (suppose that there are many forum replies quoting the same text), you will get several results showing and linking to the same master node. This is quite confusing to users. One way to address this is to alter the line template to let a user know that the information exists in different places. For example, for forum threads, we can use the following:

<a href={$node.parent.url|ezurl}>{$node.parent.name|wash} (in reply by {$node.object.owner.name|wash} on {$node.object.published|l10n( 'datetime' )})</a>

Alternatively, we can use anchors to mark the helper nodes' content in the master node's full view. First, insert the following piece of code in the full view of the parent node, preceding the display of each child (assuming $child denotes a helper node):

<a name="{$child.url|trim( '/' )|explode( '/' )|reverse[0]}"></a>

Then, insert this code in the line view of the child in order to link to the anchor:

{def $url_last_part=$node.url|trim( '/' )|explode( '/' )|reverse[0]}
<a href={concat( $node.parent.url, '#', $url_last_part )|ezurl}>{$node.parent.name|wash} ({$node.name|wash})</a>

Indexing child nodes in their master nodes

The best solution is to show only the master nodes in search results. The following section describes a simple and elegant solution.

Every datatype in eZ Publish has a special function named metaData(), which returns a string of words to be indexed (or a special array that contains such a string). Using this to our advantage, we will prepare a special datatype, which will not hold any information. Its sole purpose is to return a string of all words indexed for the children of its node. In addition, we will prepare a workflow event that re-indexes the parent object if it has an attribute of the new datatype.

Our example assumes that the built-in eZ Publish search engine is being used, for simplicity and ease of understanding. However, we present a framework that can be tweaked to work with other search engines or to make it work regardless of the search engine used.

Let's take a look at the files in the example extension childrenindexer:

datatypes/childrenindexer/childrenindexertype.php

<?php

include_once( 'kernel/classes/ezdatatype.php' );

define( 'DATATYPESTRING_CHILDRENINDEXER', 'childrenindexer' );

class ChildrenIndexerType extends eZDataType
{
    function ChildrenIndexerType()
    {
        $this->eZDataType( DATATYPESTRING_CHILDRENINDEXER, 'Children Indexer' );
    }

    function isIndexable()
    {
        return true;
    }

    function metaData( $contentObjectAttribute )
    {
        $db =& eZDB::instance();
        $contentObjectID = $contentObjectAttribute->attribute( 'contentobject_id' );
        // Find words indexed for children of the current object's main node
        $words = $db->arrayQuery( "SELECT word 
                                   FROM ezcontentobject_tree ot,
                                        ezcontentobject_tree t,
                                        ezsearch_object_word_link l,
                                        ezsearch_word w
                                   WHERE ot.contentobject_id=$contentObjectID
                                     AND ot.main_node_id=ot.node_id
                                     AND t.parent_node_id=ot.node_id
                                     AND l.contentobject_id=t.contentobject_id
                                     AND w.id=l.word_id", array( 'limit' => 1000 ) );
        $metaData = array();
        foreach ( $words as $word )
        {
            $metaData[] = $word['word'];
        }
        $metaDataString = implode( $metaData, ' ' );
 
        return $metaDataString;
    }
}

eZDataType::register( DATATYPESTRING_CHILDRENINDEXER, 'childrenindexertype' );

?>

eventtypes/event/reindexparent/reindexparenttype.php:

<?php
 
include_once( 'kernel/classes/ezworkflowtype.php' );
include_once( 'kernel/classes/ezcontentcachemanager.php' );
 
define( 'WORKFLOW_TYPE_REINDEXPARENT_ID', 'reindexparent' );
 
class ReindexParentType extends eZWorkflowEventType
{
    function ReindexParentType()
    {
        $this->eZWorkflowEventType( WORKFLOW_TYPE_REINDEXPARENT_ID, 'Reindex parent node' );
    }
 
    function execute( &$process, &$event )
    {
        $parameters = $process->attribute( 'parameter_list' );
        $objectID = $parameters['object_id'];
 
        $db =& eZDB::instance();
 
        $pathString = $db->arrayQuery( "SELECT path_string
                                        FROM ezcontentobject_tree
                                        WHERE main_node_id=node_id
                                          AND contentobject_id=$objectID" );
        if ( $pathString )
        {
            $pathString = $pathString[0]['path_string'];
            $path = array_reverse( explode( '/', trim( $pathString, '/' ) ) );
            array_shift( $path );
            // $path now contains node IDs of all ancestors (starting with the parent node)
 
            foreach( $path as $ancestorNodeID )
            {
                /* Find object ID but only if the ancestor's object contains searchable
                   attribute of the childrenindexer datatype. */
                $ancestorObjectID = $db->arrayQuery( "SELECT a.contentobject_id
                                                      FROM ezcontentobject_tree t,
                                                           ezcontentobject_attribute a,
                                                           ezcontentclass_attribute ca
                                                      WHERE t.node_id=$ancestorNodeID
                                                        AND t.contentobject_id=a.contentobject_id
                                                        AND t.contentobject_version=a.version
                                                        AND a.contentclassattribute_id=ca.id
                                                        AND ca.version=0
                                                        AND ca.data_type_string='childrenindexer'
                                                        AND ca.is_searchable=1", array( 'limit' => 1 ) );
                if ( !$ancestorObjectID )
                {
                    break;
                }
 
                $ancestorObjectID = $ancestorObjectID[0]['contentobject_id'];
 
                require_once( 'kernel/content/ezcontentoperationcollection.php' );
                eZContentOperationCollection::registerSearchObject( $ancestorObjectID, false );
            }
        }
 
        return EZ_WORKFLOW_TYPE_STATUS_ACCEPTED;
    }
}
 
eZWorkflowEventType::registerType( WORKFLOW_TYPE_REINDEXPARENT_ID, 'reindexparenttype' );
 
?>

settings/content.ini.append.php:

<?php /* #?ini charset="iso-8859-1"?
 
[DataTypeSettings]
ExtensionDirectories[]=childrenindexer
AvailableDataTypes[]=childrenindexer
 
*/ ?>

settings/workflow.ini.append.php:

<?php /* #?ini charset="iso-8859-1"?
 
[EventSettings]
ExtensionDirectories[]=childrenindexer
AvailableEventTypes[]=event_reindexparent
 
*/ ?>

In addition to the files listed above, you should also create “edit” and “view” templates for both the datatype and workflow event. If you are testing out the code above, you can either ignore the warnings or create these templates (they can be empty).

To use this extension, enable it, then add a “reindex parent node” event to the workflow triggered after content is published. You also need to add an attribute of the “Children indexer” datatype to classes used as master nodes. If your site already has some content, you must re-index your site by running bin/php/updatesearchindex.php.

How the extension works

Let's have a closer look at how this works. Whenever eZ Publish indexes or re-indexes an object, it goes through all searchable attributes and calls the metaData() function on them to collect words to index. All datatypes use metaData() to obtain the set of the words. For example, for the text string datatype it just takes the content of the string, while for binary files external programs are run to extract such words. We created a special datatype that grabs the keywords indexed for a node's children: the SQL command simply finds all child objects and looks in the index tables to get their indexed words. We have limited the number of words to grab to 1000, in order to avoid indexing too much data. (For performance issues on large sites, you should also consider using delayed indexing – see the DelayedIndexing and The cronjob scripts documentation pages for more information.)

Whenever a node is published or re-published, a workflow event “reindexparent” checks whether the parent node (of the main location of the current object) has a searchable attribute of the “Children indexer” datatype. If it does, the workflow event re-indexes the object of this node and tries to repeat the procedure for its parent.

This enables you to use grandchildren as helper nodes if necessary. Consider the following example:

Node/object G (has a childrenindexer attribute)

 +- Node/object P (has a childrenindexer attribute)

    +- Node/object C

If object C is published or re-published, its parent P is re-indexed and will also index words indexed for object C. After that, object G is re-indexed too and will grab the words indexed for P, which also include words indexed for C. Thus, the words indexed for C are also indexed for G.

The extension does not cover the case when a helper node is removed, as there is no suitable trigger available in eZ Publish. However, removing helper child nodes is rare and the remaining words in the index do not usually cause any problems. If you need to fix this, you can, for example, write some cronjob code to re-index objects of master nodes. (Even more sophisticated, your code can re-index only master nodes that have had a child node removed after the last time the cronjob ran).

Omitting helper nodes in search results

When you search for terms included in helper nodes, you will get hits in the helper object (or objects) as well as in the master parent object, so you need to limit the search by class IDs. You can use content/advancedsearch as the main search function on your site and use a GET or POST parameter named "SearchContentClassID" to limit the classes. Or, you can alter the search template to use a configuration file to limit the class IDs.

First, use this setting in a site.ini override file:

[SearchSettings]
SearchViewHandling=template

Then, limit the search to classes defined in a configuration file in content/search.tpl:

{set $page_limit=$search_page_limit|choose( 10, 5, 10, 20, 30, 50 )}
{if $page_limit|not}
    {set $page_limit=10}
{/if}
 
{def $search=fetch( 'content', 'search', hash(
    'text', $search_text,
    'section_id', $search_section_id,
    'class_id', ezini( 'Search', 'ClassID', 'search.ini' ),
    'subtree_array', $search_subtree_array,
    'publish_timestamp', $search_timestamp,
    'sort_by', array( 'modified', false() ),
    'offset', $view_parameters.offset,
    'limit', $page_limit
) )}
 
{set $search_result=$search['SearchResult']
     $search_count=$search['SearchCount']
     $stop_word_array=$search['StopWordArray']
     $search_data=$search}
 
{*
    Remaining lines are the same as in the original content/search.tpl 
    starting with <div class="content-search"> for the base design or
    <div class="border-box"> for the ezwebin design. Remove the /let
    at the very bottom of the template as well.
*}

Finally, enter the class IDs to search in the search.ini(.append.php) configuration file:

<?php /* #?ini charset="utf-8"?
 
[Search]
ClassID[]
ClassID[]=2
# ...
 
*/ ?>

You can extend this model to enable the choice to include or exclude helper nodes from search results. One approach would involve:

  • introducing another array variable in the settings, such as “AltClassID”, which includes the classes in the “ClassID” array variable as well as for the helper nodes' classes;
  • having a checkbox or hidden field named “SearchAlt” on the search form; and
  • altering the fetch function's hash to use “AltClassID” or “ClassID”, depending on whether “SearchAlt” is checked or set. Don't forget that you need to include the “SearchAlt” parameter in the page navigator links (when there are multiple pages of search results).
36 542 Users on board!

Tutorial menu

Printable

Printer Friendly version of the full article on one page with plain styles

Author(s)