Tuesday, November 20, 2007

MochiWeb got a HTML parser

Since MochiWeb went open source I have been working with Tait Larson on a Comet server (HTTP push) built on top of MochiWeb. We are not there yet, but it has been a very pleasant experience so far, especially considering an earlier (failed) attempt of mine to build such a thing on top of yaws, where I had to patch yaws and deal with all kind of annoyances (don't get me wrong, yaws is great for 99% of all possible use cases, just HTTP Push belongs to the other 1%).

So today I was just doing some online housekeeping and I noticed that MochiWeb got a HTML parser. Thats great ! So far, it has been asked many times on the Erlang mailing list how to parse HTML. And sooner or later somebody points to the yaws HTML parser, which works reasonably well. One time it was me asking that question and when I got the answer, I started to play with that yaws HTML parser and some simple XHTML (if I remember properly) examples and everything looked fine. But things turned nasty when I tested the yaws parser with real world HTML.

Now I hope people are starting testing and crashing (I just did) the MochiWeb parser with real world HTML and provide feedback to the developers so they can further improve it !

No comments: