PyOPWParse is a library written in Python that provides a set of classes to extract elements and attributes from ODT, PDF and DOCX files regardless, of file type.
As a result, you always get a single structure of elements and their properties.
Current version available here
- Extract paragraphs with styles
- Extract tables with styles
- Extract document attributes
PyOPWParse requires the following:
- python 3.9+
- odfpy==1.4.1
- pdfminer.six==20220524
- pdfplumber==0.7.5
- requests==2.28.1
- python-docx==0.8.11
- uvicorn~=0.22.0
- tabula-py
- pydantic~=1.10.7
- bestconfig==1.3.6
- fastapi~=0.95.1
in dev
in dev
in dev