-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into documentatation
- Loading branch information
Showing
96 changed files
with
4,803 additions
and
2,051 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
name: Run tests | ||
|
||
on: | ||
push: | ||
branches: [ "master" ] | ||
pull_request: | ||
branches: [ "master" ] | ||
|
||
permissions: | ||
contents: read | ||
|
||
jobs: | ||
build: | ||
|
||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- uses: actions/checkout@v3 | ||
- name: Set up Python 3.10 | ||
uses: actions/setup-python@v3 | ||
with: | ||
python-version: "3.10" | ||
- name: Install dependencies | ||
run: pip install -r requirements.txt | ||
- name: Install pytest | ||
run: | | ||
python -m pip install --upgrade pip | ||
pip install pytest | ||
- name: Test docx | ||
run: pytest tests/docx/test* | ||
- name: Codecov | ||
uses: codecov/[email protected] |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
from src.PDF.PDFParser import PDFParser | ||
from os import walk | ||
|
||
for dir_path, dir_names, file_names in walk('.\\documents'): | ||
for filename in file_names: | ||
''' | ||
Declare an object of the PDFParser class, in the initialization parameter, | ||
which will indicate the path to the pdf file | ||
''' | ||
pdf_parser = PDFParser(path=dir_path + '\\' + filename) | ||
lines = pdf_parser.lines | ||
spaces = pdf_parser.line_spaces | ||
tables = pdf_parser.list_of_table | ||
list_of_picture = pdf_parser.pictures | ||
''' | ||
Using the get_elements method, we get a file of the UnifiedDocumentView type, | ||
which contains data about the entire text document and its structural elements | ||
''' | ||
document = pdf_parser.get_all_elements(lines, spaces, tables, list_of_picture) | ||
# To write information about structural elements, use the write_CSV method, specifying the save path | ||
# document.write_CSV(dir_path + '\\csv\\' + filename + '.csv') | ||
''' | ||
To create a JSON string from data about structural elements, which will later be sent to the classifier, | ||
use the create_json_to_clasifier method, which takes a list of required fields as parameters | ||
''' | ||
json = document.create_json() |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,21 +1,31 @@ | ||
certifi==2022.12.7 | ||
certifi==2023.5.7 | ||
cffi==1.15.1 | ||
charset-normalizer==3.0.0 | ||
charset-normalizer==2.0.0 | ||
click==8.1.3 | ||
cryptography==38.0.1 | ||
cryptography==40.0.2 | ||
defusedxml==0.7.1 | ||
Flask==2.2.2 | ||
guppy3==3.1.2 | ||
guppy3 | ||
idna==3.4 | ||
itsdangerous==2.1.2 | ||
Jinja2==3.1.2 | ||
MarkupSafe==2.1.1 | ||
odfpy==1.4.1 | ||
pdfminer.six==20220524 | ||
pdfplumber==0.7.5 | ||
Pillow==9.2.0 | ||
pdfplumber~=0.9.0 | ||
Pillow | ||
pycparser==2.21 | ||
requests==2.28.1 | ||
urllib3==1.26.13 | ||
Wand==0.6.10 | ||
Werkzeug==2.2.2 | ||
requests==2.30.0 | ||
urllib3==2.0.2 | ||
Wand==0.6.11 | ||
Werkzeug | ||
lxmlx==2.0.2 | ||
python-docx==0.8.11 | ||
bestconfig==1.3.6 | ||
fastapi~=0.95.1 | ||
uvicorn~=0.22.0 | ||
starlette~=0.26.1 | ||
dacite==1.8.0 | ||
tabulate | ||
tabula-py | ||
pydantic~=1.10.7 | ||
lxml~=4.9.2 | ||
tabula-py |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.