Skip to content

Commit

Permalink
Shorten relase notes and give @Sicos1977 credit. Thanks!
Browse files Browse the repository at this point in the history
  • Loading branch information
KevM committed Dec 8, 2016
1 parent 6741004 commit 4b9408c
Showing 1 changed file with 2 additions and 76 deletions.
78 changes: 2 additions & 76 deletions Release-Notes.md
Original file line number Diff line number Diff line change
@@ -1,81 +1,7 @@
## 1.14

- Extract all headers from MSG/RFC822 (TIKA-2122).

- Upgrade metadata-extractor to 2.9.1 (TIKA-2113).

- Extract PDF DocInfo metadata into separate keys to prevent
overwriting by XMP metadata (TIKA-2057).

- Re-enable fileUrl for tika-server (TIKA-2081). If you choose,
to use this feature, beware of the security vulnerabilities!
See: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3271

- Add Tesseract's hOCR output format as an option, via Eric Pugh
(TIKA-2093)

- Extract macros from MSOffice files (TIKA-2069).

- Maintain passed-in mime in TXTParser (TIKA-2047).

- Upgrade to POI.3-15 (TIKA-2013).

- Upgrade to PDFBox 2.0.3 (TIKA-2051).

- Fix hyperlinks with formatting in DOC and DOCX (TIKA-1255
and TIKA-2078)

- Tika now is integrated with the Tensorflow library from Google
and it can use its Inception v3 image classification model to
identify objects in images (TIKA-1993).

- Parser configuration is now type-safe and parameters for parsers
can have assigned types (TIKA-1508, TIKA-1986).

- Prevent OOM/permanent hang on some corrupt CHM files (TIKA-2040).

- Upgrade ICU4J charset detection components to fix multithreading
bug (TIKA-2041).

- Upgrade to Jackcess 2.1.4 (TIKA-2039).

- Maintain more significant digits in cells of "General" format
in XLS and XLSX (TIKA-2025).

- Avoid mark/reset issues when extracting or detecting embedded resources
in RFC822 emails (TIKA-2037).

- Improving accuracy of Tesseract for better extraction of numeric
and alphanumeric text from images (TIKA-2021, TIKA-2031).

- Improve extraction of embedded documents from PPT, PPTX and XLSX
(TIKA-2026).

- Add parser for applefile (AppleSingle) (TIKA-2022).

- Add mime types, mime magic and/or globs for:
- Endnote Import File (TIKA-2011)
- DJVU files (TIKA-2009)
- MS Owner File (TIKA-2008)
- Windows Media Metafile (TIKA-2004)
- iCal and vCalendar (TIKA-2006)
- MBOX (TIKA-2042)
- Stata DTA (TIKA-2064)

- Add configurable maximum threshold for number of events extracted
from the XMP Media Management Schema in JempboxExtractor (TIKA-1999).

- Integrate TesseractOCR with full page image rendering for PDFs (TIKA-1994).

- Add mime detection via Nick C and parser for DBF files (TIKA-1513).

- Add mime detection and parsers for MSOffice 2003 XML Word
and Excel formats (TIKA-1958).

- Extract hyperlinks from PPT, PPTX, XSLX (TIKA-1454).

- Upgrade to Commons Compress 1.12 (supports progress on TIKA-1358)

- Tika updated to 1.14. Please see the official Tika site for [what's changed](https://dist.apache.org/repos/dist/release/tika/CHANGES-1.14.txt).
- Please note that TikaOnDotnet assemblies are now signed. Thank you [@Sicos1977](https://github.com/Sicos1977) for the PR.

## 1.13.1

Expand Down

0 comments on commit 4b9408c

Please sign in to comment.