XML Parsers considered harmful

  • 6 July 2010

I am fighting with Xalan and Xerces (in C++). After looking for decent tutorials (there are none) I found this little gem among the Google results. It clearly emphasises what I already know. XML parsers are from Hell. Xalan & Xerces are especially tricky since they’ve been ported from Java. The API is a bit weird. Some things contradict intuition. For example if you initialise the transformation engines more than once per process run, the destructor for the XSLTInputSource crashes with SIGSEGV. You get no clue. You return() and just as the objects get out of scope your program crashes. The secret is hidden in the API documentation. And you cannot easily stop the XSLT transformer from downloading/accessing the document’s DTD. You have to provide your own EntityResolver class that resolves all entities without DTD, if you wish to ignore it. Charming. Bureaucratic. Have I mentioned Java already?

Google result hit for XML parser software with a malware warning.

XML considered harmful.

If you know a decent and light-weight XSLT transformation library, let me know. I just need to delete tags from HTML, XHTML and XML documents (which worked well with regular expressions before). The XSLT template is quite short, and the task isn’t very complicated.

2 Comments

  1. Hakan - July 7, 2010 at 22:53

    You should use PHP and maybe the PEAR extension. There are a lot of easy ways to make what you want. ;)

  2. Nightlynx - July 8, 2010 at 00:19

    True, but my code is C/C++.

Sorry, the comment form is now closed.

Top