We've done this before and published an example, <i>http://svn.projects.ez.no/bcimportcsv/trunk/extension/bcimportcsv/bin/bccsvjoomlacontenttablehtmlimport.php</i>
The key here is to replace those tags. Our example transforms variable html into valid ezxml (including replacing img and a tags with content object embeds).
Another idea is to use the html parser in Online Editor (5.0), since it already supports quite much (x)html. But I have not had any time to test it, so don't have any code examples other then the one in ezoe. see: eZOEXmlInput::validateInput() in http://svn.projects.ez.no/ezoe/trunk/ezoe/ezxmltext/handlers/input/ezoexmlinput.php
It will not handle images though, as those are embed tags in ezxml, and you'll need to first import the image in eZ and add a id on the image tag in the form "eZObject_<object_id>".
The normal xml handler is documented here:
http://ez.no/doc/ez_publish/technical_manual/4_0/reference/xml_tags It will accept <h[1-6]> in input as well as of 4.1.
The xml handler in OE will accept the html variants of the tags there, where:
literal -> <pre>
anchor -> <a name="">
embed (image) -> <img id="eZObject_<object_id>" /> In addtion the <u>, <sup> and <sub> tags are mapped to custom tags if enabled.
It is not documented since it was not meant for external imports. So at the moment, enable the 'code' button in ezoe.ini to be able to take a look at what kind of xhtml it uses internally(or use firebug or similar point and click html debuggers).
...it's not the parser that's being selective on case, but XHTML by definition requires all tags to be lower case, different from HTML.
I thus made all HTML tags in the text I parse lowercase using the following PHP function found here: http://www.codingforums.com/archive/index.php/t-108303.html