This repository has been archived by the owner on Feb 16, 2023. It is now read-only.
Paperless-ng 0.9.5
Pre-release
Pre-release
jonaswinkler
released this
05 Dec 14:15
·
2192 commits
to master
since this release
This release concludes the big changes I wanted to get rolled into paperless. The next releases before 1.0 will focus on fixing issues, primarily.
-
OCR
- Paperless now uses OCRmyPDF to perform OCR on documents. It still uses tesseract under the hood, but the PDF parser of Paperless has changed considerably and will behave different for some douments.
- OCRmyPDF creates archived PDF/A documents with embedded text that can be selected in the front end.
- Paperless stores archived versions of documents alongside with the originals. The originals can be accessed on the document edit page. If available, a dropdown menu will appear next to the download button.
- Many of the configuration options regarding OCR have changed. See the documentation for details.
- Paperless no longer guesses the language of your documents. It always uses the language that you specified with
PAPERLESS_OCR_LANGUAGE
. Be sure to set this to the language the majority of your documents are in. Multiple languages can be specified, but that requires more CPU time. - The management command
document_archiver
can be used to create archived versions for already existing documents.
-
Tags from consumption folder.
- Thanks to
jayme-github
, paperless now consumes files from sub folders in the consumption folder and is able to assign tags based on the sub folders a document was found in. This can be configured withPAPERLESS_CONSUMER_RECURSIVE
andPAPERLESS_CONSUMER_SUBDIRS_AS_TAGS
.
- Thanks to
-
API
- The API now offers token authentication.
- The endpoint for uploading documents now supports specifying custom titles, correspondents, tags and types. This can be used by clients to override the default behavior of paperless. See the documentation on file uploads.
- The document endpoint of API now serves documents in this form:
- correspondents, document types and tags are referenced by their ID in the fields
correspondent
,document_type
andtags
. The*_id
versions are gone. These fields are read/write. - paperless does not serve nested tags, correspondents or types anymore.
- correspondents, document types and tags are referenced by their ID in the fields
-
Front end
- Paperless does some basic caching of correspondents, tags and types and will only request them from the server when necessary or when entirely reloading the page.
- Document list fetching is about 10%-30% faster now, especially when lots of tags/correspondents are present.
- Some minor improvements to the front end, such as document count in the document list, better highlighting of the current page, and improvements to the filter behavior.
-
Fixes:
- A bug with the generation of filenames for files with unsupported types caused the exporter and document saving to crash.
- Mail handling no longer exits entirely when encountering errors. It will skip the account/rule/message on which the error ocured.
- Assigning correspondents from mail sender names failed for very long names. Paperless no longer assigns correspondents in these cases.