Skip to content

AxiomAlive/normcontrol-Document-Parser

 
 

Repository files navigation

python license

issueo issuec

PyOPWParse

PyOPWParse is a library written in Python that provides a set of classes to extract elements and attributes from ODT, PDF and DOCX files regardless, of file type.

As a result, you always get a single structure of elements and their properties.

Docs

Current version available here

Features

  • Extract paragraphs with styles
  • Extract tables with styles
  • Extract document attributes

Requirements

PyOPWParse requires the following:

  • python 3.9+
  • odfpy==1.4.1
  • pdfminer.six==20220524
  • pdfplumber==0.7.5
  • requests==2.28.1
  • python-docx==0.8.11
  • uvicorn~=0.22.0
  • tabula-py
  • pydantic~=1.10.7
  • bestconfig==1.3.6
  • fastapi~=0.95.1

Installation

in dev

Getting started

in dev

Contributing

in dev

Authors

Viacheslav Martsinkevich

Vladislav Tereshchenko

Andrei Berezhkov

Galina Larionova

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%