Howto import lot of html pages into eZPublish nodes ?

Howto import lot of html pages into eZPublish nodes ?

Thursday 26 October 2006 4:41:09 am - 6 replies

Modified on Thursday 26 October 2006 4:47:02 am by Denis Zatsarinny

Author Message

Xavier Dutoit

Friday 27 October 2006 1:44:17 am

Never easy.

I'd first download the html page, tidy them and parse the content I need to import. Then start with some kind of xml import (have a look at the contrib).

That means php dev anyway.

Good luck

X+

http://www.sydesy.com

Denis Zatsarinny

Tuesday 31 October 2006 12:04:37 am

Hi

>Never easy.
>I'd first download the html page, tidy them and parse the content I need to import. Then start >with some kind of xml import (have a look at the contrib).

Ok - I wrote a php script what convert mambo content in the eZ structures (Folders & Articles) - but mambo articles include very UGLY html content - and I found 3 error during run this script with eZPublisg 3.8.5

1. Fatal error: Call to a member function on a non-object in /srv/www/htdocs/ezportal/kernel/classes/datatypes/ezxmltext/input/ezxmlsimplifiedinputparser.php on line 611

orig: if ( $parent->nodeName == 'line' && !count( $parent->Children ) )

I am replace: if ( $parent->nodeName == 'line' && !count( $parent->Children ) && is_object($parent->parentNode) )

2. Fatal error: Call to a member function on a non-object in /srv/www/htdocs/ezportal/kernel/classes/datatypes/ezxmltext/ezxmlinputparser.php on line 772
orig: function &processSubtree( &$element, &$lastHandlerResult )
{
$ret = null;
$tmp = null;

I am replace: function &processSubtree( &$element, &$lastHandlerResult )
{
$ret = null;
$tmp = null;

if(!is_object($element)) return $ret;

but after this patch I got the fatal error - segmentation fault (ugly)

Can somebody recommdate me algoritm/soft/etc. what convert HTML in to eZXML ?

I was trying use tidy form http://tidy.sf.net - but has same result
If I use tidy options such as <b>--clean true, --word-2000 true</b> - I got empty content

 

 

Joe Kepley

Tuesday 31 October 2006 7:36:11 am

This sounds extremely hairy, and if you have a lot of pages with poorly-structured HTML, you're bound to have some things that don't fit into eZXML's content structure.

If I were faced with this, I'd look at importing it directly as HTML into an HTML datatype, and use something like TinyMCE or FCKEditor to provide the WYSIWYG. eZXML has a lot of benefits, but trying to coax HTML into eZXML (when it didn't start out that way) would be like trying to put smoke in a bottle.

Denis Zatsarinny

Tuesday 31 October 2006 10:01:24 am

Hi,

Dear Joe:

Using TinyMCE || FCKeditor - is not solution for this trouble

I was using FCEditor + Mambo for intranet site - and now have this problem.
I was found way to deploy mambo content to eZ structures - but ~20% of old mambo articles include MSO generated html and have size more then 2Mb. eZXML converter cannot process this documents - even tidy processed - but these documents very important

Bye

Denis Zatsarinny

Wednesday 01 November 2006 11:57:31 pm

Hi

Great news - I am upgrading from eZPublish 3.8.4 -> 3.8.6 - and any PHP error like this: Fatal error: Call to a member function on a non-object in - but (ugly) - I found internal PHP error: <b>*** glibc detected *** double free or corruption (fasttop): 0x1202fd98 ***</b> - but this error internal PHP interpretator bug

Bye

kracker (the)

The Doctor

Sunday 05 November 2006 3:51:48 pm

<b>@Denis</b>,

Please file a bug report.
<i>http://issues.ez.no/IssueEdit.php?ProjectId=3</i>

You can also help us repeat your bug more accurately
by sharing your mambo import script as a contribution.
If your concerned about it not being 100% complete, you may mark it as unstable.

- Login,
- Click, <i>http://ez.no/community/contribs/import_export</i>
- Click, upload contribution
- Complete and submit form
- Post a link in this forum thread to the contribution

//kracker
<i>The GNU/Linux Action Show! Podcast</i>

Member since: 2001.07.13 || http://ezpedia.se7enx.com/

You must be logged in to post messages in this topic!

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.