Indexing ALL ms office files ?

Indexing ALL ms office files ?

Friday 13 March 2009 1:48:51 am - 4 replies

Author Message

Paul Borgermans

Friday 13 March 2009 8:34:32 am

You may have a look at http://projects.ez.no/eztika

There are currently some problems for CJK documents though

hth
Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Jean-Yves Zinsou

Friday 13 March 2009 8:47:13 am

Thanks a lot Paul,
what does CJK mean ?

Do Androids Dream of Electric Sheep?
I dream of eZpubliSheep....
------------------------------------------------------------------------
http://www.alma.fr

Paul Borgermans

Friday 13 March 2009 9:53:20 am

There are some known issues with CJK = Chinese, Japanese, Korean font sets, probably all asian languages (just tested CJK for now)

For pdf indexing CJK, best use xpdf and use a wrapper script/.bat that you configure in binaryfile.ini with the following content:

<path to>pdftotext -enc "UTF-8" $1 -

hth
Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Jean-Yves Zinsou

Friday 13 March 2009 10:25:11 am

Thanks a lot Paul ,

You made my day !! ;-)

Do Androids Dream of Electric Sheep?
I dream of eZpubliSheep....
------------------------------------------------------------------------
http://www.alma.fr

You must be logged in to post messages in this topic!

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.