Facets : Truncated values on ezstring

Monday 12 July 2010 3:33:20 am - 22 replies

Modified on Wednesday 01 June 2011 3:41:33 am by H-Works Agency

22 replies

Author	Message
Ivo Lukac	Monday 12 July 2010 9:18:31 am The problem (or feature) you experience is the result of how Solr tokenizes text. There is a word delimiter filter while indexing which breaks down the words with 'dash' in it. These tokens are then used for faceting. There are new functionalities in ezfind 2.2 regarding this (using special fields for faceting) but I didn't explore it yet. But you can always tune schema.xml :) http://www.linkedin.com/in/ivolukac http://www.netgen.hr/eng/blog http://twitter.com/ilukac
Paul Borgermans	Monday 12 July 2010 3:20:34 pm Indeed, in ezfind 2.2 you can define dedicated field types for attributes in a facet context ... this was introduced exactly for having both meaningful search results (in that case you usually want this "break up") and facets/sorting (where you want verbatim strings). What datatype is used for keywords? You are using either eZ Find 2.0 or eZ Find 2.1+ with a text field judging from your results Paul eZ Publish, eZ Find, Solr expert consulting and training http://twitter.com/paulborgermans
H-Works Agency	Monday 26 July 2010 6:17:47 am Thank you for those informations. This solr query syntax looks very powerful. EZP is Great
Sebastiaan van der Vliet	Tuesday 27 July 2010 5:45:30 am In case you do want to tune schema.xml, here is the information you need. Leave in the line: <dynamicField name="*_t" type="text" indexed="true" stored="true" multiValued="true" termVectors="true"/> but add a definition underneath that one for your own field, and replace type="text" in type="long", e.g.: <field name="attr_dc_coverage_t" type="long" indexed="true" stored="true" multiValued="true"/> Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.
H-Works Agency	Tuesday 05 October 2010 10:18:22 am Hello everyone and thank you for the answers. For example my facets results for a city attribute is "Paris, Pari" But the "s" letter is not a word separator isn't it ? I tried Sebastiaan answer by adding : <field name="attr_ville_t" type="long" indexed="true" stored="true" multiValued="true"/> just after the mentionned line but it doesn't change anything :( EZP is Great
Sebastiaan van der Vliet	Tuesday 05 October 2010 11:18:26 am This looks interesting too: on http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory check out the entry for solr.WordDelimiterFilterFactory, which has an option preserveOriginal="1", which causes the original token to be indexed without modifications (in addition to the tokens produced due to other options). for example: <fieldtype name="subword" class="solr.TextField"> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1" /> etc... Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.
Patrick Kaiser	Tuesday 05 October 2010 5:09:25 pm You can control the way content is indexed by defining a mapping between ez-datatypes and solr field-types. This can be configured in ezfind.ini[.append.php] independently for searching, sorting, faceting and filtering. For faceting the solr-field-type "string" is probably what you want. [SolrFieldMapSettings] # this is the configuration for searching DatatypeMap[ezstring]=text ... # for sorting DatatypeMapSort[] DatatypeMapSort[ezstring]=string ... # for faceting DatatypeMapFacet[] DatatypeMapFacet[ezstring]=string ... # for filtering DatatypeMapFilter[] DatatypeMapFilter[ezstring]=string .. Remember to run updatesearchindexsolr.php after you make these changes. hope this helps. Best regards, Patrick
H-Works Agency	Wednesday 06 October 2010 1:47:02 am Damn still not working :( I added those variables in ezfind.ini (which seems to be cleaner than modifying system wide schema.xml) : DatatypeMap[ezstring]=string DatatypeMapSort[ezstring]=string DatatypeMapFilter[ezstring]=string DatatypeMapFacet[ezstring]=string Default=string Then rerun updatesearchindexsolr.php -s $siteaccess_name --clean-all My ezstring attribute still return facets truncated values. EZP is Great
Sebastiaan van der Vliet	Thursday 07 October 2010 12:16:09 am Martin, did you also try the option below in scheme.xml? preserveOriginal="1" Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.
H-Works Agency	Thursday 07 October 2010 2:44:21 am I tried to put this everywhere but tweaking of ezfind.ini or schema.xml seems to have no effect on what solr or ezfind returns. Even deleting or bugging schema.xml doesn't change anything : After running "updatesearchindexsolr.php" all facets results remains the same !!! Could someone tell me which schema.xml do we have to edit ? Here is the list i found : ./java/solr/conf/schema.xml ./java/solr.multicore/eng-GB/conf/schema.xml ./java/solr.multicore/fre-FR/conf/schema.xml ./java/solr.multicore/nor-NO/conf/schema.xml None of those seems to be used ? if i delete all those files nothing changes. EZP is Great
Sebastiaan van der Vliet	Monday 18 October 2010 7:03:48 am Two quick checks: Did you restart solr after editing schema.xml? Did you delete your previous index first and then commit? Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.
H-Works Agency	Monday 18 October 2010 8:04:30 am Thank you. In fact i haven't restarted solr. What do you mean by deleting previous index and commit ? Commit = restart solr with new schema.xml ? When i add my directive : <field name="attr_ville_t" type="long" indexed="true" stored="true" multiValued="true"/> Then solr is crashing : (curl error 7) EZP is Great
H-Works Agency	Monday 18 October 2010 8:52:27 am Hello Patrick, What does those modifications on ezfind.ini are supposed to do ? Are they supposed to modify the way facets are returned through DatatypeMapFilter[] ? I really don't get it as nothing ever change no matter what i modify in this file. EZP is Great
H-Works Agency	Monday 18 October 2010 9:00:20 am This looks interesting too: on http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory check out the entry for solr.WordDelimiterFilterFactory, which has an option preserveOriginal="1", which causes the original token to be indexed without modifications (in addition to the tokens produced due to other options). for example: <fieldtype name="subword" class="solr.TextField"> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1" /> etc... This is what i get after adding "preserveOriginal="1" to schema.xml on line 221 (then restarting solr, then removing extension/ezfind/java/(...)/data/*, then rerunning updatesolrindex) : <body><h2>HTTP ERROR: 500</h2><pre>Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: <abortOnConfigurationError>false</abortOnConfigurationError> in solr.xml EZP is Great
Patrick Kaiser	Monday 18 October 2010 9:10:55 am if you follow my directions then there should be no need to even touch the schema.xml. If you didnt configure multicore solr, then your schema.xml ist this one: ./java/solr/conf/schema.xml Before proceeding replace your messed up one with the orginal file. restart solr and make sure solr runs. Then really make sure the attribute you want to facet on is of type ezstring (perhaps you are using eztext or something?). You could also try the ezkeyword datatype which should work "out of the box". did this help? Best regards, Patrick
H-Works Agency	Monday 18 October 2010 9:31:51 am Ok Patrick. Thanks all those informations are a great help for me to finally being able to use ezfind on production projects. My attribute is a simple "ezstring" attribute holding city names. If i just add : DatatypeMapFacet[ezstring]=lckeyword in ezfind.ini then its not changing anything. after rerunning updatesolrindex. My results are still truncated like this : Paris become Pari Rennes become Renn ...etc Do i need to insert : DatatypeMapFacet[ezstring]=string ? EZP is Great
Patrick Kaiser	Monday 18 October 2010 10:13:02 am I meant you should try add a new field of type ezkeyword to your class in addition to your existing city attribute. edit a few objects and add content in the new keyword field. adjust ezfind.ini: DatatypeMapFacet[] DatatypeMapFacet[ezstring]=string DatatypeMapFacet[ezkeyword]=lckeyword clear the cache and control in admin interface if the siteaccess you are using for your facet tests uses the right settings for ezfind.ini (it really seems that the settings for ezfind.ini are not used). then rerun updatesearchindexsolr.php -s YOUR_SITEACCESS --clean-all then you can try try faceting on both fields, actually both should work. Best regards, Patrick
H-Works Agency	Tuesday 19 October 2010 10:57:42 am You were right i had a loading problem with my ezfind.ini....extension loading order problem...:( Now everything works with your directives ! Thanx a lot ! EZP is Great
Michele Paoli	Friday 01 April 2011 4:08:25 am Hi everybody, I have the same problem with an ezobjectrelationlist attribute. Word space are considered not as a "normal charachter", but as a separator char. I tried setting up ezfind.ini in this way: DatatypeMap[ezobjectrelationlist]=text DatatypeMapSort[ezobjectrelationlist]=string DatatypeMapFacet[ezobjectrelationlist]=string DatatypeMapFilter[ezobjectrelationlist]=string I set preserveOriginal="1" on <fieldType name="text" class="solr.TextField" positionIncrementGap="100">... <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>... I restart solr, reindex, but facets of content type ezobjectrelationlist are truncated. Could someone help me? Bye
Michele Paoli	Friday 01 April 2011 7:15:42 am I solved using http://projects.ez.no/ezfsolrdocumentfieldobjectrelation and making some modifications to it. Now my problem is filtering with white space words... Bye

You must be logged in to post messages in this topic!