I am using it (eZ Find) in a clustered environment. It works. However, I'm not sure it's working completely properly. First of all, I know the indexing is happening on all nodes in the cluster independently. Sometimes a search, when re-run against the cluster, returns different results because the indexes are not all up to date on all the cluster nodes.
Thus, I also have questions. How should it ideally be set up? Should eZ Find run on its own dedicated node instead of multiple copies (and multiple indexes) on multiple nodes? If so, how do I redirect all search requests to that dedicated node? Is that something I can do within eZ Publish, or do I have to do it using special rules in my load balancer?
The recommended way is to treat Solr as if it was a database:
- a single instance used, even in eZP cluster configs
- on a dedicated server
It makes everything much simpler (except for high-availability, but since the index can be rebuilt at any time, for sites without a huge number of contents and where a small search downtime is ok, rebuilding the index in case of a crash of the solr server might be acceptable. For real ha, you'll have to dig into solr master/slave modes)
About how to set it up: simply put in the solr.ini file the hostname of the server where solr is running. No loadbalancing needed.
Principal Consultant International Business
Member of the Community Project Board
Should the single instance as Gaetano mentioned is not enough (which I really doubt), Solr has a native master-slave cluster mode which is easy to set up.
See http://wiki.apache.org/solr/SolrReplication
In case of replication, to direct eZ Find/Solr backend writes, you can use a reverse proxy for eZ Find up to 2.2. The next version (2.3) may have simple directives to disriminate the master (for writes) and slaves (read) too ... however, work is underway in Solr to make even that unnecessary.
For those who attend the barcamp at the eZ Conference in Berlin next week and want to know more, shout about it and I'll give you some insight
Thank you both, Giunta and Borgermans! This helps clarify things for me.
One question remains. Which node is responsible for updating the index? Is the dedicated node expected to have enough of the eZ Publish infrastructure and scripts in place so that it can run the updates in a cron job by itself? Or should one of the web server nodes in the cluster run the update scripts (and will they connect to the dedicated solr instance on the remote host)?
The indexing of data is always done in a "push" way, ie. one of the servers with ezp installed will make http requests to solr on port 8993.
Which server does the requests depends a little bit:
realtime idnexing: the server where the content is being edited will do the push
delayedindexing: one server where eZP is fully installed and the indexation cronjob is run
It does not need to be an eZP node dedicated to push the content to the indexing server, but in common scenarios it is one node dedicated to
serving the editing interface
running eZP cronjobs
The advantage of keeping the editing interface separated is that even under peak traffic conditions editors will still have a snappy interface. Also while heavy cronjobs are running, visitors of the site will not be impacted.
Principal Consultant International Business
Member of the Community Project Board
You must be logged in to post messages in this topic!