charset problem

charset problem

Wednesday 12 December 2007 3:02:32 am - 8 replies

Author Message

laurent le cadet

Thursday 13 December 2007 6:30:19 am

It sounds like the encoding is not correct.
Must we have a utf-8 db?

Kåre Køhler Høvik

Thursday 13 December 2007 7:36:40 am

Hi

UTF8 should not be required for eZ Find and eZP3. If you have a test environment available, please try to comment out these two lines in <i>extension/ezfind/java/solr/conf/schema.xml</i>

....
<!--        <filter class="ISOLatin1AccentFilterFactory"/> -->
...
<!--        <filter class="ISOLatin1AccentFilterFactory"/> -->
...

restart Solr, and reindex the data.

Kåre Høvik

laurent le cadet

Friday 14 December 2007 3:08:13 am

Hi Kåre,

We add comment for the lines :

<!-- <filter class="ISOLatin1AccentFilterFactory"/> -->

restart solr and reindex but the results are still corrupted :

This text :

Le DMP est con�u pour r�aliser pour le microdosage de tr�s haute pr�cision de tous les produits

Should be :

Le DMP est conçu pour réaliser pour le microdosage de très haute précision de tous les produits

The charcaters : ç,é,è (and I presume all the special characters) are not well encode.

Stuck at this point.

Any hint ?

regards.

Laurent

laurent le cadet

Monday 17 December 2007 4:36:37 am

Hi,

I read that on http://lucene.apache.org/solr/tutorial.html#Requirements :

"SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported"

Is that related to our problem or can we override that?

I tryed almost everythings without any results actually.

Best regards

Laurent

Kåre Køhler Høvik

Monday 17 December 2007 4:59:55 am

Hi

Thank you for looking into this.

It looks you found the problem. The resolution for this is to use eZ Find to convert the data to UTF-8 before it's indexed. Please add a bug report about this in the issue tracker, and I'll fix it as soon as I have time.

Best regards
Kåre

Kåre Høvik

laurent le cadet

Monday 17 December 2007 5:07:56 am

Kåre,

I'm going to report the bug.
As you can see, there is additionnal info for encoding/decoding (java.net) or another alternative with additionnal code :

String encoding = request.getCharacterEncoding();
if (null == encoding) {
  // Set your default encoding here 
  request.setCharacterEncoding("UTF-8");
} else {
  request.setCharacterEncoding(encoding);
}
...
String value = request.getParameter("q");

I'm digging in the "java.net" solution. For the other one, I don't know if it can serves us and where to apply the "patch".

Any idea?

Laurent

laurent le cadet

Wednesday 19 December 2007 2:50:18 am

Finally, I convert the DB to UTF-8.
Everything works fine.

(http://ez.no/developer/forum/general/convert_from_iso_8859_1_encoding_to_utf_8/)

Hope this help.

laurent

John Smith

Tuesday 19 August 2008 10:26:10 am

hi laurent,

I used the script by Kristof Coomans while upgrading 3.6.1 to 3.8.0 to do the uft-8 conversion, which is posted on

http://ez.no/developer/forum/install_configuration/update_to_3_8_and_codepage_problems

I am getting the notice of

SET NAMES 'utf8' on adminstration and public website.

Are you getting the same....

Please help...

You must be logged in to post messages in this topic!

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.