From 4b9408cc1f624f3dc6d5df5095b6397186930664 Mon Sep 17 00:00:00 2001 From: KevM Date: Wed, 7 Dec 2016 23:24:06 -0600 Subject: [PATCH] Shorten relase notes and give @Sicos1977 credit. Thanks! --- Release-Notes.md | 78 ++---------------------------------------------- 1 file changed, 2 insertions(+), 76 deletions(-) diff --git a/Release-Notes.md b/Release-Notes.md index 6b19034..49d0dfb 100644 --- a/Release-Notes.md +++ b/Release-Notes.md @@ -1,81 +1,7 @@ ## 1.14 -- Extract all headers from MSG/RFC822 (TIKA-2122). - -- Upgrade metadata-extractor to 2.9.1 (TIKA-2113). - -- Extract PDF DocInfo metadata into separate keys to prevent - overwriting by XMP metadata (TIKA-2057). - -- Re-enable fileUrl for tika-server (TIKA-2081). If you choose, - to use this feature, beware of the security vulnerabilities! - See: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3271 - -- Add Tesseract's hOCR output format as an option, via Eric Pugh - (TIKA-2093) - -- Extract macros from MSOffice files (TIKA-2069). - -- Maintain passed-in mime in TXTParser (TIKA-2047). - -- Upgrade to POI.3-15 (TIKA-2013). - -- Upgrade to PDFBox 2.0.3 (TIKA-2051). - -- Fix hyperlinks with formatting in DOC and DOCX (TIKA-1255 - and TIKA-2078) - -- Tika now is integrated with the Tensorflow library from Google - and it can use its Inception v3 image classification model to - identify objects in images (TIKA-1993). - -- Parser configuration is now type-safe and parameters for parsers - can have assigned types (TIKA-1508, TIKA-1986). - -- Prevent OOM/permanent hang on some corrupt CHM files (TIKA-2040). - -- Upgrade ICU4J charset detection components to fix multithreading - bug (TIKA-2041). - -- Upgrade to Jackcess 2.1.4 (TIKA-2039). - -- Maintain more significant digits in cells of "General" format - in XLS and XLSX (TIKA-2025). - -- Avoid mark/reset issues when extracting or detecting embedded resources - in RFC822 emails (TIKA-2037). - -- Improving accuracy of Tesseract for better extraction of numeric - and alphanumeric text from images (TIKA-2021, TIKA-2031). - -- Improve extraction of embedded documents from PPT, PPTX and XLSX - (TIKA-2026). - -- Add parser for applefile (AppleSingle) (TIKA-2022). - -- Add mime types, mime magic and/or globs for: - - Endnote Import File (TIKA-2011) - - DJVU files (TIKA-2009) - - MS Owner File (TIKA-2008) - - Windows Media Metafile (TIKA-2004) - - iCal and vCalendar (TIKA-2006) - - MBOX (TIKA-2042) - - Stata DTA (TIKA-2064) - -- Add configurable maximum threshold for number of events extracted - from the XMP Media Management Schema in JempboxExtractor (TIKA-1999). - -- Integrate TesseractOCR with full page image rendering for PDFs (TIKA-1994). - -- Add mime detection via Nick C and parser for DBF files (TIKA-1513). - -- Add mime detection and parsers for MSOffice 2003 XML Word - and Excel formats (TIKA-1958). - -- Extract hyperlinks from PPT, PPTX, XSLX (TIKA-1454). - -- Upgrade to Commons Compress 1.12 (supports progress on TIKA-1358) - +- Tika updated to 1.14. Please see the official Tika site for [what's changed](https://dist.apache.org/repos/dist/release/tika/CHANGES-1.14.txt). +- Please note that TikaOnDotnet assemblies are now signed. Thank you [@Sicos1977](https://github.com/Sicos1977) for the PR. ## 1.13.1