Skip to content

Commit

Permalink
Merge pull request #62 from normcontrol/documentatation
Browse files Browse the repository at this point in the history
[Fix]: Updated README.md
  • Loading branch information
Vl-Tershch authored Jan 15, 2024
2 parents 0755da0 + b80b3cc commit a475400
Showing 1 changed file with 89 additions and 27 deletions.
116 changes: 89 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,116 @@
# PyOPDParse

![Your logo](https://itmo.ru/file/pages/213/logo_na_plashke_russkiy_belyy.png)

[![python](https://badgen.net/badge/python/3.9|3.10|3.11/blue?icon=python)](https://www.python.org/)
[![license](https://badgen.net/github/license/normcontrol/normcontrol-Document-Parser)](https://www.python.org/)

[![issueo](https://badgen.net/github/open-issues/normcontrol/normcontrol-Document-Parser)](https://github.com/normcontrol/normcontrol-Document-Parser/issues)
[![issuec](https://badgen.net/github/closed-issues/normcontrol/normcontrol-Document-Parser)](https://github.com/normcontrol/normcontrol-Document-Parser/issues?q=is%3Aissue+is%3Aclosed)

# PyOPWParse
## The purpose of the project

PyOPWParse is a library written in Python that provides a set of classes to extract elements and attributes from ODT,
PDF and DOCX files regardless, of file type.
PyOPDParse is a library written in Python that provides a set of classes to extract elements and attributes from ODT,
PDF and DOCX files. As a result, you always get a single structure of elements and their properties.

As a result, you always get a single structure of elements and their properties.
## Table of Contents

## Docs
- [Core features](#core-features)
- [Installation](#installation)
- [Examples](#examples)
- [Project Structure](#project-structure)
- [Documentation](#documentation)
- [Getting started](#getting-started)
- [License](#license)
- [Acknowledgments](#acknowledgments)
- [Contacts](#contacts)
- [Authors](#authors)

Current version available [here](https://normcontrol.github.io/normcontrol-Document-Parser/#/)
## Core features

## Features
- parser of structural elements of PDF documents,
- parser of structural elements of ODT documents,
- parser of structural elements of DOCX documents,
- unified classes of structural elements for documents of the specified formats.

- Extract paragraphs with styles
- Extract tables with styles
- Extract document attributes
## Installation

## Requirements
```in dev```

PyOPWParse requires the following:
## Examples

- python 3.9+
- odfpy==1.4.1
- pdfminer.six==20220524
- pdfplumber==0.7.5
- requests==2.28.1
- python-docx==0.8.11
- uvicorn~=0.22.0
- tabula-py
- pydantic~=1.10.7
- bestconfig==1.3.6
- fastapi~=0.95.1
```in dev```

## Installation
## Project Structure

```
PyOPDParse/
├── README.md
├── LICENSE.md
├── requirements.txt
├── src/
│ ├── classes/
│ ├── interfaces/
│ ├── InformalParserInterface.py
│ ├── superclasses/
│ ├── StructuralElement.py
│ ├── Frame.py
│ ├── Image.py
│ ├── List.py
│ ├── Paragraph.py
│ ├── Table.py
│ ├── TableRow.py
│ ├── TableCell.py
│ ├── UnifiedDocumentView.py
│ ├── odt/
│ ├── elements/
│ ├── AutomaticStyleParser.py
│ ├── DefaultStyleParser.py
│ ├── RegularStyleParser.py
│ ├── ImageParser.py
│ ├── ListParser.py
│ ├── NodeParser.py
│ ├── ParagraphParser.py
│ ├── TableParser.py
│ ├── ODTDocument.py
│ ├── ODTParser.py
│ ├── pdf/
│ ├── pdfclasses/
│ ├── Line.py
│ ├── PDFParagraph.py
│ ├── PDFParser.py
│ ├── docx/
│ ├── DocxParagraphParser.py
│ ├── helpers/
├── examples/
├── docs/
├── tests/
└──
```

## Documentation

```in dev```
Current version available [here](https://normcontrol.github.io/normcontrol-Document-Parser/#/)

## Getting started

```in dev```

## Contributing
## License

```in dev```

## Acknowledgments

The development team expresses its deep gratitude for the support provided to ITMO University.

## Contacts

Your contacts. For example:

- [Telegram channel](https://t.me/+rIyKfiGQ7fFhZDEy) answering questions about project
- [email protected]
- [email protected]

## Authors

[Viacheslav Martsinkevich](https://github.com/slavamarcin)
Expand Down

0 comments on commit a475400

Please sign in to comment.