diff --git a/lib/META-INF/CHANGES b/lib/META-INF/CHANGES new file mode 100644 index 0000000..79764fa --- /dev/null +++ b/lib/META-INF/CHANGES @@ -0,0 +1,1013 @@ +jsoup changelog + +*** Release 1.11.2 [PENDING] + * Improvement: added a new pseudo selector :matchText, which allows text nodes to match as if they were elements. + This enables finding text that is only marked by a "br" tag, for example. + + + * Change: marked Connection.validateTLSCertificates() as deprecated. + + * Improvement: normalize invisible characters (like soft-hyphens) in Element.text(). + + + * Improvement: added Element.wholeText(), to easily get the un-normalized text value of an element and its children. + + + * Bugfix: in a deep DOM stack, a StackOverFlow exception could occur when generating implied end tags. + + + * Bugfix: when parsing attribute values that happened to cross a buffer boundary, a character was dropped. + + + * Bugfix: fixed an issue that prevented using infinite timeouts in Jsoup.Connection. + + + * Bugfix: whitespace preserving tags were not honoured when nested deeper than two levels deep. + + + * Bugfix: an unterminated comment token at the end of the HTML input would cause an out of bounds exception. + + + * Bugfix: an NPE in the Cleaner which would occur if an attribute value was missing. + + + * Bugfix: when serializing the same document in a multiple threads, on Android, with a character set that is not ascii + or UTF-8, an encoding exception could occur. + + + * Bugfix: removing a form value from the DOM would not remove it from FormData. + + + * Bugfix: in the W3CDom transformer, siblings were incorrectly inheriting namespaces defined on previous siblings. + + +*** Release 1.11.1 [2017-Nov-06] + * Updated language level to Java 7 from Java 5. To maintain Android support (of minversion 8), try-with-resources are + not used. + + + * When loading content from a URL or a file, the content is now parsed as it streams in from the network or disk, + rather than being fully buffered before parsing. This substantially reduces memory consumption & large garbage + objects when loading large files. Note that this change means that a response, once parsed, may not be parsed + again from the same response object unless you call response.bufferUp() first, which will buffer the full response + into memory. + + + * Added Connection.Response.bodyStream(), a method to get the response body as an input stream. This is useful for + saving a large response straight to a file, without buffering fully into memory first. + + * Performance improvements in text and HTML generation (through less GC). + + * Reduced memory consumption of text, scripts, and comments in the DOM by 40%, by refactoring the node + hierarchy to not track childnodes or attributes by default for lead nodes. For the average document, that's about a + 30% memory reduction. + + + * Reduced memory consumption of Elements by refactoring their Attributes to be a simple pair of arrays, vs a + LinkedHashSet. + + + * Added support for Element.selectFirst(query), to efficiently find the first matching element. + + * Added Element.appendTo(parent) to simplify slinging elements about. + + + * Added support for multiple headers with the same name in Jsoup.Connect + + * Added Element.shallowClone() and Node.shallowClone(), to allow cloning nodes without getting all their children. + + + * Updated Element.text() and the :contains(text) selector to consider   character as spaces. + + * Updated Jsoup.connect().timeout() to implement a total connect + combined read timeout. Previously it specified + connect and buffer read times only, so to implement a combined total timeout, you had to have another thread send + an interupt. + + * Improved performance of Node.addChildren (was quadratic) + + + * Added missing support for template tags in tables + + + * In Jsoup.connect file uploads, added the ability to set the uploaded files' mimetype. + + + * Improved Node traversal, including less object creation, and partial and filtering traversor support. + + + * Bugfix: if a document was was redecoded after character set detection, the HTML parser was not reset correctly, + which could lead to an incorrect DOM. + + + * Bugfix: attributes with the same name but different case would be incorrectly treated as different attributes. + + + * Bugfix: self-closing tags for known empty elements were incorrectly treated as errors. + + + * Bugfix: fixed an issue where a self-closing title, noframes, or style tag would cause the rest of the page to be + incorrectly parsed as data or text. + + + * Bugfix: fixed an issue with unknown mixed-case tags + + + * Bugfix: fixed an issue where the entity resources were left open after startup, causing a warning. + + + * Bugfix: fixed an issue where Element.getElementsByIndexLessThan(index) would incorrectly provide the root element + + + * Improved parse time for pages with exceptionally deeply nested tags. + + + * Improvement / workaround: modified the Entities implementation to load its data from a .class vs from a jar resource. + Faster, and safer on Android. + + +*** Release 1.10.3 [2017-Jun-11] + * Added Elements.eachText() and Elements.eachAttr(name), which return a list of Element's text or attribute values, + respectively. This makes it simpler to for example get a list of each URL on a page: + List urls = doc.select("a").eachAttr("abs:href""); + + * Improved selector validation for :contains(...) with unbalanced quotes. + + + * Improved the speed of index based CSS selectors and other methods that use elementSiblingIndex, by a factor of 34x. + + + * Added Node.clearAttributes(), to simplify removing of all attributes of a Node / Element. + + + * Bugfix: if an attribute name started or ended with a control character, the parse would fail with a validation + exception. + + + * Bugfix: Element.hasClass() and the ".classname" selector would not find the class attribute case-insensitively. + + + * Bugfix: In Jsoup.Connection, if a redirect contained a query string with %xx escapes, they would be double escaped + before the redirect was followed, leading to fetching an incorrect location. + + * Bugfix: In Jsoup.Connection, if a request body was set and the connection was redirected, the body would incorrectly + still be sent. + + + * Bugfix: In DataUtil when detecting the character set from meta data, and there are two Content-Types defined, use + the one that defines a character set. + + + * Bugfix: when parsing unknown tags in case-sensitive HTML mode, end tags would not close scope correctly. + + + * In Jsoup.Connection, ensure there is no Content-Type set when being redirected to a GET. + + + * Bugfix: in certain locales (Turkey specifically), lowercasing and case insensitivity could fail for specific items. + + + * Bugfix: after an element was cloned, changes to its child list where not notifying the element correctly. + + +*** Release 1.10.2 [2017-Jan-02] + * Improved startup time, particularly on Android, by reducing garbage generation and CPU execution time when loading + the HTML entity files. About 1.72x faster in this area. + + * Added Element.is(query) to check if an element matches this CSS query. + + * Added new methods to Elements: next(query), nextAll(query), prev(query), prevAll(query) to select next and previous + element siblings from a current selection, with optional selectors. + + * Added Node.root() to get the topmost ancestor of a Node. + + * Added the new selector :containsData(), to find elements that hold data, like script and style tags. + + * Changed Jsoup.isValid(bodyHtml) to validate that the input contains only body HTML that is safe according to the + whitelist, and does not include HTML errors. And in the Jsoup.Cleaner.isValid(Document) method, make sure the doc + only includes body HTML. + + + + * In Whitelists, validate that a removed protocol exists before removing said protocol. + + * Allow the Jsoup.Connect thread to be interrupted when reading the input stream; helps when reading from a long stream + of data that doesn't read timeout. + + + * Jsoup.Connect now uses a desktop user agent by default. Many developers were getting caught by not specifying the + user agent, and sending the default 'Java'. That causes many servers to return different content than what they would + to a desktop browser, and what the developer was expecting. + + * Increased the default connect/read timeout in Jsoup.Connect to 30 seconds. + + * Jsoup.Connect now detects if a header value is actually in UTF-8 vs the HTTP spec of ISO-8859, and converts + the header value appropriately. This improves compatibility with servers that are configured incorrectly. + + * Bugfix: in Jsoup.Connect, URLs containing non-URL-safe characters were not encoded to URL safe correctly. + + + * Bugfix: a "SYSTEM" flag in doctype tags would be incorrectly removed. + + + * Bugfix: removing attributes from an Element with removeAttr() would cause a ConcurrentModificationException. + + * Bugfix: the contents of Comment nodes were not returned by Element.data() + + * Bugfix: if source checked out on Windows with git autocrlf=true, Entities.load would fail because of the \r char. + +*** Release 1.10.1 [2016-Oct-23] + * New feature: added the option to preserve case for tags and/or attributes, with ParseSettings. By default, the HTML + parser will continue to normalize tag names and attribute names to lower case, and the XML parser will now preserve + case, according to the relevant spec. The CSS selectors for tags and attributes remain case insensitive, per the CSS + spec. + + * Improved support for extended HTML entities, including supplemental characters and multiple character references. + Also reduced memory consumption of the entity tables. + + + + * Added support for *|E wildcard namespace selectors. + + + * Added support for setting multiple connection headers at once with Connection.headers(Map) + + + * Added support for setting/overriding the response character set in Connection.Response, for cases where the charset + is not defined by the server, or is defined incorrectly. + + + * Improved performance of class selectors by reducing memory allocation and garbase collection. + + + * Improved performance of HTML output by reducing the creation of temporary attribute list iterators. + + + * Fixed an issue when converting to the W3CDom XML, where valid (but ugly) HTML attribute names containing characters + like '"' could not be converted into valid XML attribute names. These attribute names are now normalized if possible, + or not added to the XML DOM. + + + * Fixed an OOB exception when loading an empty-body URL and parsing with the XML parser. + + + * Fixed an issue where attribute names starting with a slash would be parsed incorrectly. + + + * Don't reuse charset encoders from OutputSettings, to make threadsafe. + + + * Fixed an issue in connections with a requestBody where a custom content-type header could be ignored. + + +*** Release 1.9.2 [2016-May-17] + * Fixed an issue where tag names that contained non-ascii characters but started with an ascii character + would cause the parser to get stuck in an infinite loop. + + + * In XML documents, detect the charset from the XML prolog - + + + * Fixed an issue where created XML documents would have an incorrect prolog. + + + * Fixed an issue where you could not use an attribute selector to find values containing unbalanced braces or + parentheses. + + + * Fixed an issue where namespaced tags (like ) would cause Element.cssSelector() to fail. + + +*** Release 1.9.1 [2016-Apr-16] + * Added support for HTTP and SOCKS request proxies, specifiable per connection. + + + * Added support for sending plain HTTP request bodies in POST and PUT requests, with Connection.requestBody(String). + + * Added support in Jsoup.Connect for HEAD, OPTIONS, TRACE. + + + * Added support for HTTP 307 Temporary Redirect (replays posts, if applicable). + + + * Performance improvements when parsing HTML, particularly for Android Dalvik. + + * Added support for writing HTML into Appendable objects (like OutputStreamWriter), to enable stream serialization. + + + * Added support for XML namespaces when converting jsoup documents to W3C documents. + + + * Added support for UTF-16 and UTF-32 character set detection from byte-order-marks (BOM). + + + * Added support for tags with non-ascii (unicode) letters. + + + * Added Connection.data(key) to retrieve a data KeyVal by its key. Useful to update form data before submission. + + * Fixed an issue in the Parent selector where it would not match against the root element it was applied to. + + + * Fix an issue where elements.select(query) would not return every matching element if they had the same content. + + + * Added not-null validators to Element.appendText() and Element.prependText() + + + * Fixed an issue when moving moving nodes using Element.insert(index, children) where the sibling index would be set + incorrectly, leading to the original loads being lost. + + + * Reverted Node.equals() and Node.hashCode() back to identity (object) comparisons, as deep content inspection + had negative performance impacts and hashkey stability problems. Functionality replaced with Node.hasSameContent(). + + + * In Jsoup.Connect, if the same header key is seen multiple times, combine their values with a comma per the HTTP RFC, + instead of keeping just one value. Also fixes an issue where header values could be out of order. + + +*** Release 1.8.3 [2015-Aug-02] + * Added support for custom boolean attributes. + + + * When fetching XML URLs, automatically switch to the XML parser instead of the HTML parser. + + + * Performance improvement on parsing larger HTML pages. On Android KitKat, around 1.7x times faster. On Android + Lollipop, ~ 1.3x faster. Improvements largely from re-ordering the HtmlTreeBuilder methods based on analysis of + various websites; also from further memory reduction for nodes with no children, and other tweaks. + + * Fixed an issue in Element.getElementSiblingIndex (and related methods) where sibling elements with the same content + would incorrectly have the same sibling index. + + + * Fixed an issue where unexpected elements in a badly nested table could be moved to the wrong location in the + document. + + + * Fixed an issue where a table nested within a TH cell would parse to an incorrect tree. + + + * When serializing a document using the XHTML encoding entities, if the character set did not support   chars + (such as Shift_JIS), the character would be skipped. For visibility, will now always output &xa0; when using XHTML + encoding entities (as   is not defined), regardless of the output character set. + + + * Fixed an issue when resolving URLs, if the absolute URL had no path, the relative URL was not normalized correctly. + Also fixed an issue where connections that were redirected to a relative URL did not have the same normalization + rules as a URL read from Nodes.absUrl(String). + + + * When serialising XML, ensure that '<' characters in attributes are escaped, per spec. Not required in HTML. + + +*** Release 1.8.2 [2015-Apr-13] + * Performance improvements for parsing HTML on Android, of 1.5x to 1.9x, with larger parses getting a bigger + speed increase. For non-Android JREs, around 1.1x to 1.2x. + + * Dramatic performance improvement in HTML serialization on Android (KitKat and later), of 115x. Improvement by working + around a character set encoding speed regression in Android. + + + * Performance improvement for the class name selector on Android (.class) of 2.5x to 14x. Around 1.2x + on non-Android JREs. + + * File upload support. Added the ability to specify input streams for POST data, which will upload content in + MIME multipart/form-data encoding. + + * Add a meta-charset element to documents when setting the character set, so that the document's charset is + unambiguous. + + + * Added ability to disable TLS (SSL) certificate validation. Helpful if you're hitting a host with a bad cert, + or your JDK doesn't support SNI. + + + * Added ability to further tweak the canned Cleaner Whitelists by removing existing settings. + + + * Added option in Cleaner Whitelist to allow linking to in-page anchors (#) + + + * Use a lowercase doctype tag for HTML5 documents. + + * Add support for 201 Created with redirect, and other status codes. Treats any HTTP status code 2xx or 3xx as an OK + response, and follow redirects whenever there is a Location header. + + + * Added support for HTTP method verbs PUT, DELETE, and PATCH. + + * Added support for overriding the default POST character of UTF-8 + + + * W3C DOM support: added ability to convert from a jsoup document to a W3C document, with the W3Dom helper class. + + * In the HtmlToPlainText example program, added the ability to filter using a CSS selector. Also clarified + the usage documentation. + + * Fixed validation of cookie names in HttpConnection cookie methods. + + + * Fixed an issue where