we just made some tests with ezfind2 and we found the same problem. The solr indexes took 650GB. This is really big. The same solr indexes with ezfind1 and related solr is 9,5GB.
@Xavier : Any feedback on your issue ? Did the proposed solution of disabling the OptimizeOnCommit directive + setting up a daily 'optimize' workflow work ?
@Ali : This index size is very surprising. Did the indexed content base grow a lot between the eZ Find 1.x usage and eZ Find 2.0 ? Are external elements indexed ( through the DataImportHandler Solr extension for instance ? ) too ? Websites crawled ?
Best regards,
--
Nicolas Pastorino
Director Community - eZ
Member of the Community Project Board
We use the same database for tests and the data in database is not changing. Also we don't index any external elements.
We continue to do tests with this. We test in one other test server and there the size of data dir was less than the other server, where it was 650GB, but it is still big. 14GB for 40% indexed data.
many thanks for your proposed solution, it seems to work fine now disabling optimizeoncommit. Only one detail, in the updatesearchindexsolr.php on each commit, every 1000 objects, it's forced an optimize, not respecting the setting in the ini file. I believe that this should be updated, because reindexation of the whole xml files takes more than 4 hours.
thanks!
--
Xavier Serna
eZ Publish Certified Developer
Departament de Software
Microblau S.L. - http://www.microblau.net
+34 937 466 205
When the indexing start to index objects that have relationlist attribute(s), then indexing loops between these objects and indices are getting bigger and bigger. When we did these attributes no searchable, then for 2 GB database indexing was much faster and the indices size was hundred of MB.