NekoHTML is a simple and open source HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.
Requirements:
· Java 1.3 or later
· Xerces2 Java
What's New in This Release: [ read full changelog ]
· Don't rely on default locale for lowercase/uppercase conversion of tag and attribute names (#3544334, based on patch provided by Ronald Brill), accept only FRAME, FRAMESET, and NOFRAMES within FRAMESET (fix StackOverflowError #3555034), add HEAD before FRAMESET when missing, recognize encoding specified in META charset='...', fix StackOverflowError occurring with content after closing BODY tag with feature http://cyberneko.org/html/features/balance-tags/ignore-outside-content set to true (#3490807), TABLE doesn't close inline elements anymore (#3527659, reverting fix for #2019307).