PyOPDParse

The purpose of the project

PyOPDParse is a library written in Python that provides a set of classes to extract elements and attributes from ODT, PDF and DOCX files. As a result, you always get a single structure of elements and their properties.

Core features

parser of structural elements of PDF documents,
parser of structural elements of ODT documents,
parser of structural elements of DOCX documents,
unified classes of structural elements for documents of the specified formats.

Installation

in dev

Examples

in dev

Project Structure

PyOPDParse/
├── README.md
├── LICENSE.md
├── requirements.txt
├── src/
│   ├── classes/
│       ├── interfaces/
│           ├── InformalParserInterface.py
│       ├── superclasses/
│           ├── StructuralElement.py
│       ├── Frame.py
│       ├── Image.py
│       ├── List.py
│       ├── Paragraph.py
│       ├── Table.py
│       ├── TableRow.py
│       ├── TableCell.py
│       ├── UnifiedDocumentView.py
│   ├── odt/
│       ├── elements/
│           ├── AutomaticStyleParser.py
│           ├── DefaultStyleParser.py
│           ├── RegularStyleParser.py
│           ├── ImageParser.py
│           ├── ListParser.py
│           ├── NodeParser.py
│           ├── ParagraphParser.py
│           ├── TableParser.py
│           ├── ODTDocument.py
│       ├── ODTParser.py
│   ├── pdf/
│       ├── pdfclasses/
│           ├── Line.py
│           ├── PDFParagraph.py
│       ├── PDFParser.py
│   ├── docx/
│       ├── DocxParagraphParser.py
│   ├── helpers/
├── examples/
├── docs/
├── tests/
└──

Documentation

Current version available here

Getting started

in dev

License

in dev

Acknowledgments

The development team expresses its deep gratitude for the support provided to ITMO University.

Contacts

Your contacts. For example:

Telegram channel answering questions about project
[email protected]
[email protected]

Authors

Viacheslav Martsinkevich

Vladislav Tereshchenko

Andrei Berezhkov

Galina Larionova

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyOPDParse

The purpose of the project

Table of Contents

Core features

Installation

Examples

Project Structure

Documentation

Getting started

License

Acknowledgments

Contacts

Authors

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src		src
tests		tests
.gitignore		.gitignore
.pep8speaks.yml		.pep8speaks.yml
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

License

normcontrol/normcontrol-Document-Parser

Folders and files

Latest commit

History

Repository files navigation

PyOPDParse

The purpose of the project

Table of Contents

Core features

Installation

Examples

Project Structure

Documentation

Getting started

License

Acknowledgments

Contacts

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages