jsoup Changelog

What's new in jsoup 1.8.2

Apr 16, 2015

Performance improvements for parsing HTML on Android, of 1.5x to 1.9x, with larger parses getting a bigger speed increase. For non-Android JREs, around 1.1x to 1.2x.
Dramatic performance improvement in HTML serialization on Android (KitKat and later), of 115x. Improvement by working around a character set encoding speed regression in Android.
Performance improvement for the class name selector on Android (.class) of 2.5x to 14x. Around 1.2x on non-Android JREs.
File upload support. Added the ability to specify input streams for POST data, which will upload content in MIME multipart/form-data encoding.
Add a meta-charset element to documents when setting the character set, so that the document's charset is unambiguous.
Added ability to disable TLS (SSL) certificate validation. Helpful if you're hitting a host with a bad cert, or your JDK doesn't support SNI.
Added ability to further tweak the canned Cleaner Whitelists by removing existing settings.
Added option in Cleaner Whitelist to allow linking to in-page anchors (#)
Use a lowercase doctype tag for HTML5 documents.
Add support for 201 Created with redirect, and other status codes. Treats any HTTP status code 2xx or 3xx as an OK response, and follow redirects whenever there is a Location header.
Added support for HTTP method verbs PUT, DELETE, and PATCH.
Added support for overriding the default POST character of UTF-8
W3C DOM support: added ability to convert from a jsoup document to a W3C document, with the W3Dom helper class.
In the HtmlToPlainText example program, added the ability to filter using a CSS selector. Also clarified the usage documentation.
Fixed validation of cookie names in HttpConnection cookie methods.
Fixed an issue where tags would be missed when preparing a form for submission if missing a selected attribute.
Fixed an issue where submitting a form would incorrectly include radio and checkbox values without the checked attribute.
Fixed an issue where Element.classNames() would return a set containing an empty class; and may have extraneous whitespace.
Fixed an issue where attributes selected by value were not correctly space normalized.
In head+noscript elements, treat content as character data, instead of jumping out of head parsing.
Fixed performance issue when parsing HTML with elements with many children that need re-parenting.
Fixed an issue where a server returning an unsupport character set response would cause a runtime UnsupportedCharsetException, instead of falling back to the default UTF-8 charset.
Fixed an issue where Jsoup.Connection would throw an IO Exception when reading a page with zero content-length.
Improved the equals() and hashcode() methods in Node, to consider all their child content, for DOM tree comparisons.
Improved performance in Selector when searching multiple roots.

New in jsoup 1.8.1 (Sep 29, 2014)

New in jsoup 1.7.3 (Sep 29, 2014)

New in jsoup 1.7.2 (Apr 11, 2013)

New in jsoup 1.7.1 (Dec 19, 2012)

New in jsoup 1.6.2 (Mar 28, 2012)

Added a simplified XML parsing mode, which can usefully parse valid and invalid XML, but does not enforce any HTML document structure or special tag behaviour.
Added the optional ability to track errors when tokenising and parsing.
Added jsoup.connect.cookies(Map) method, to set multiple cookies at once, possibly from a prior request.
Added Element.textNodes() and Element.dataNodes(), to easily access an element's children text nodes and data nodes.
Added an example program that demonstrates how to format HTML as plain-text, and the use of the NodeVisitor interface.
Added Node.traverse() and Elements.traverse() methods, to iterate through a node's descendants.
Updated jsoup.connect so that when requests made as POSTs are redirected, the redirect is followed as a GET.
Updated the Cleaner and whitelists to optionally preserve related links in elements, instead of converting them to absolute links.
Updated the Cleaner to support custom allowed protocols such as "cid:" and "data:".
Updated handling of tags, to act on only the first one seen when parsing, to align with modern browsers.
Updated Node.setBaseUri(), to recursively set on all the node's descendants.
Fixed handling of null characters within comments.
Tweaked escaped entity detection in attributes to not treat &entity_... as an entity form.
Fixed doctype tokeniser to allow whitespace between name and public identifier.
Fixed issue where comments within a table tag would be duplicate-fostered into body.
Fixed an issue where a spurious byte-order-mark at the start of a document would cause the parser to miss head contents.
Fixed an issue where content after a frameset could cause a NPE crash. Now correctly implements spec and ignores the trailing content.
Tweaked whitespace checks to align with HTML spec
Tweaked HTML output of closing script and style tags to not add an extraneous newline when pretty-printing.
Substantially reduced default memory allocation within Node.outerHtml, to reduce memory pressure when serialising smaller DOMs.

jsoup Changelog

What's new in jsoup 1.8.2

New in jsoup 1.8.1 (Sep 29, 2014)

New in jsoup 1.7.3 (Sep 29, 2014)

New in jsoup 1.7.2 (Apr 11, 2013)

New in jsoup 1.7.1 (Dec 19, 2012)

New in jsoup 1.6.2 (Mar 28, 2012)

New in jsoup 1.2.3 (Aug 5, 2010)

New in jsoup 1.2.1 (Jul 1, 2010)