Share » Learn » eZ Publish » Advanced development with eZ Find -...

Advanced development with eZ Find - part 1 : Datatypes in Solr and eZ Find

Monday 07 June 2010 8:05:08 am

  • Currently 5 out of 5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Step 6 : Field type management on the Solr side

Solr relies on several configuration files, one of them is used to tell him that a '_s' at the end of a field name means string, '_t' for text, etc. Here is the file : /ezfind/java/solr/conf/schema.xml.
This configuration file contains the hard-coded definition for a certain amount of fields (metadata fields for instance), but also defines the so-called dynamic fields, each of them associating the dynamic fields names (content attributes ones, coming from eZ Publish) to Solr types :

<dynamicField name="*_i" type="int" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_f" type="float" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_d" type="double" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_si" type="sint" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_sf" type="sfloat" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_sd" type="sdouble" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true" multiValued="true" termVectors="true"/>
<dynamicField name="*_sl" type="slong" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_l" type="long" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_t" type="text" indexed="true" stored="true" multiValued="true" termVectors="true"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_dt" type="date" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_random" type="random" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_k" type="keyword" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_lk" type="lckeyword" indexed="true" stored="true" multiValued="true"/>
<!-- some trie-coded dynamic fields for faster range queries →
<dynamicField name="*_ti" type="tint" indexed="true" stored="true"/>
<dynamicField name="*_tl" type="tlong" indexed="true" stored="true"/>
<dynamicField name="*_tf" type="tfloat" indexed="true" stored="true"/>
<dynamicField name="*_td" type="tdouble" indexed="true" stored="true"/>
<dynamicField name="*_tdt" type="tdate" indexed="true" stored="true"/>
<!-- geopoint for geospatial/location searches, boosting, ... →
<dynamicField name="*_gpt" type="geopoint" indexed="true" stored="true"/>

This files can also be used to define more complex behaviours for given eZ Publish datatypes, like the keywords datatype (ezkeyword). Two different field types definitions can be found, related to the keywords datatype ( 'keyword' for case-sensitive cases, and 'lckeyword' for lower-case cases ), to be used equally at the eZ Find level (DatatypeMap) :

<fieldtype name="lckeyword" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
 <tokenizer class="solr.PatternTokenizerFactory" pattern=", *" />
 <filter class="solr.TrimFilterFactory" />
 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
 <analyzer type="query">
 <tokenizer class="solr.PatternTokenizerFactory" pattern=", *" />
 <filter class="solr.TrimFilterFactory" />
 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
</fieldtype>

This example, keywords fields management, teaches a lot about Solr configuration. One can note the way Solr filters are called, how coma-based word separations are handled (PatternTokenizerFactory), case-sensitivity management (LowerCaseFilterFactory), duplicate removal (RemoveDuplicatesTokenFilterFactory), etc.

 

Printable

Printer Friendly version of the full article on one page with plain styles

Author(s)