DOMDocument

DOMDocument is part of PHP's DOM extension which uses the libxml library

The DOMDocument class extends the DOMNode class which is extended by the DOMElement class. So properties and methods from all three of these classes are available to DOMDocument

The hardest part of developing this parser was trying to figure out how to deal with nested elements. These nested elements, like <strong> for example, would throw off the parser so that following text of the parent was actually inherited by the nested child.

The solution came when I found the getNodePath method in DOMNode, which I use in this class as a unique key. Finally I was able to get text parts with their correct element.

Tables

Are they in or are they out

Have you ever run across the web designer who believes (or was taught) that tables should never be used? I have, and how surprised they are when their css techniques don't work very well with tabular data. Use css for design, use tables for tabular data.

Annual Sales to Date
Salesman Sales Commissions
Jim 25,585.88 2,580.00
Lisa 68,356.22 6,830.00
Bubba 3,369.42 330.00

Lists

Unordered shopping list

Ordered shopping list by priority

  1. High caffeine soda
  2. Bread
  3. Milk

Poorly formatted markup

The real test comes with poorly formatted markup, for example & is a special character and should be an entity.

Starting another paragraph without closing the old one is common bad design.

Using [brackets] can cause some issues

However, the notices generated by DOMDocument have been suppressed, the parser isn't quite as picky.