Jan Borsodi
|
Friday 19 March 2004 12:34:48 am
It's a good idea, However there are several problems with the implementation.
1. Character sets/encodings The placement of various characters will vary from charset to charset so it needs to be integrated with the i18n library to properly handle this. There's also the problem with non-8bit charset (e.g. utf8) which will use multiple bytes for a character. A simple of solving this now is to turn the string into a Unicode array using
$codec =& eZTextCodec::instance( false, 'unicode' );
$urlElementArray = $codec->convertString( $urlElement );
Then replacing the characters using their Unicode values and converting it back.
$reverseCodec =& eZTextCodec::instance( 'unicode', false );
$urlElement = $reverseCodec->convertString( $urlElementArray );
However it should be noted that this is not very fast.
2. Unicode A good implementation should provide conversion for all characters in Unicode. For instance a site could be running utf8 and have articles in multiple languages. Actually this type of conversion is similar to lowercase, uppercase and search normalization all which should be handled by the i18n system some day.
--
Amos
Documentation: http://ez.no/ez_publish/documentation
FAQ: http://ez.no/ez_publish/documentation/faq
|
Mikhail Chekanov
|
Monday 22 March 2004 4:00:52 am
Trond Åge Kvalø wrote: >All I have to do is to add the folowing line at the top of the convertToAlias-function:
function convertToAlias( $urlElement, $defaultValue = false )
{
$urlElement = strtr( $urlElement, "ÆæØøÅå;","AaOoAa" );
...
>Am I correct or is there something I've misunderstood? In case you want to replace "Æ" with "a", not with "Aa", this is correct, but you need to remove one semicolon:
$urlElement = strtr( $urlElement, "ÆæØøÅå","AaOoAa" );
---
Emil Webber wrote:
>just another comment to the url_alias conversion:
>I've talked with two search engine experts. They say that Google like the "-" in urls more than "_". >Can anybody confirm this?
May be they are right, because Google counts "word1-word2" as two words for pageranking formula, otherwise "word1_word2" as one senseless word, AFAIK. ---
I think there are 2 possible solutions to deal with i18n of the aliases:
1st: above-named way through transliteration, but there is a problem with UTF . Do you think that some slowing is critical? This is one-time operation, isn't it?
2nd: special text field for admin interface to submit/edit url_alias manually.
---
Jan Borsodi wrote: >A simple of solving this now is to turn the string into a Unicode array...
Or something more usual, isnt'it? At one of my sites I'm using 1251 within my templates and UTF-8 for database/site due some historical reasons ;), so I've tested this code:
function convertToAlias( $urlElement, $defaultValue = false )
{
include_once( 'lib/ezi18n/classes/eztextcodec.php' );
$codec =& eZTextCodec::instance( false, 'cp1251' );
$urlElementArray = $codec->convertString( $urlElement );
$urlElementArray = strtr( $urlElementArray, "Aa...", "aa..." );
$urlElementArray = strtr( $urlElementArray, array( "z"=>"zh" ));
$reverseCodec =& eZTextCodec::instance( 'cp1251', false );
$urlElement = $reverseCodec->convertString( $urlElementArray );
...
This works good enough... but this become too complicated to be included as self-tuned code, because we have to detect charsets, not only transliteration strings/arrays.
--
mike
#6595551
|
Trond Åge Kvalø
|
Monday 22 March 2004 5:52:44 am
> >Am I correct or is there something I've misunderstood?
> In case you want to replace "Æ" with "a", not with "Aa",
> this is correct, but you need to remove one semicolon: > $urlElement = strtr( $urlElement, "ÆæØøÅå","AaOoAa" ); Ok, thanks Mikhail. Just one question, though. The way I've written it it now, wouldn't "Æ" be replaced with "A" and "æ" with "a" etc..? I did remove the semi-colon also, but it doesn't seem to work. Any ideas why?
best regards trondåge
trondåge
|
Trond Åge Kvalø
|
Tuesday 23 March 2004 3:46:51 am
>> Just one question, though. The way I've written it it now, wouldn't "Æ" be replaced
>> with "A" and "æ" with "a" etc..? > Yes, exactly. Ok, got that one then :-)
>> I did remove the semi-colon also, but it doesn't seem to work. Any ideas why?
> What charset do you use? As you can see above, there is a problem with multi-byte > encodings (e.g. UTF-8)...
This is my charset in my pagelayout.tpl <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> But on second thought you probably mean the charset in my site.ini.append file, right?
Now let's see.... I have #?ini charset="iso-8859-1" and in the database settings I have Charset=iso-8859-1 Any other place I should look?
best regards trondåge
trondåge
|
Georg Franz
|
Sunday 28 March 2004 12:15:28 pm
Hi Trond, I've altered the kernel/classes/ezurlalias.php:
function convertToAlias( $urlElement, $defaultValue = false )
{
include_once ( 'path/to/gwf_textutils.php' );
$urlElement = gwf_TextUtils::convertToAlias ( $urlElement );
if ( strlen( $urlElement ) == 0 )
{
if ( $defaultValue === false )
$urlElement = '-1';
else
{
$urlElement = $defaultValue;
$urlElement = gwf_TextUtils::convertToAlias ( $urlElement );
}
}
return $urlElement;
}
You need my "text util" class which can be found at http://ez.no/community/contributions/hacks/gwf_textutils in gwf_TextUtils::convertToAlias
the main conversion is done with
$specialChars = array ( "à", "á", "â", "ã", "ä", "å", "æ", "è", "é", "ê", "ß", " ", "'", "´", "`",
"ë", "Ç", "í", "ì", "ò", "ó", "ô", "õ", "ö", "ù", "ú", "û", "ü");
$normalChars = array ( "a", "a", "a", "a", "ae", "a", "ae", "e", "e", "e", "ss", "-", "", "", "", "e", "c", "i", "i", "o", "o", "o", "o", "oe", "u", "u", "u", "ue"); So if you have additional characters which should be "converted" you have to put it in the two arrays.
After doing the "hack", you have to run update/common/scripts/updateniceurls.php
Kind regards,
Emil (alias Georg :-)
Best wishes,
Georg.
--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004
|