PDF Search

Web interface for searching PDF files by their content

Features

Search for specific keywords within a collection of PDF files.
View matched lines from the PDF files for each search result.
Sort search results based on the relevance of matches.
Display search results with a calculated relevance ratio.
Web interface powered by Flask and SQLite database.

Requirements

Python 3.x
Flask
PyPDF2

Getting Started

Clone this repository:

git clone https://github.com/FelixKohlhas/pdf_search.git
cd pdf_search

Install the required Python packages:
```
pip install -r requirements.txt
```
Create the database
```
python generate_db.py <path to pdfs>
```
Run the web interface:
```
python app.py -f <path to pdfs>
```
Open your web browser and navigate to http://localhost:5001 to access the PDF search.

Usage

generate_db.py

usage: generate_db.py [-h] [-d DATABASE] pdf_folder

Extract text from PDF files and store it in a SQLite database.

positional arguments:
  pdf_folder            Path to the folder containing PDF files

options:
  -h, --help            show this help message and exit
  -d DATABASE, --database DATABASE
                        Path of the database

app.py

usage: app.py [-h] [-d DATABASE] [-u URL_PREFIX] [-f FILES] [--port PORT]

Flask web interface to search PDF files by their content.

options:
  -h, --help            show this help message and exit
  -d DATABASE, --database DATABASE
                        Path of the database
  -u URL_PREFIX, --url-prefix URL_PREFIX
                        URL to prefix to relative paths
  -f FILES, --files FILES
                        Directory of PDF files (optional; allows access to the files through webinterface)
  --port PORT           Port to run the Flask app (default: 5001)

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or create a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
static		static
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
generate_db.py		generate_db.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Search

Features

Requirements

Getting Started

Usage

generate_db.py

app.py

Contributing

License

About

Releases

Packages

Languages

License

FelixKohlhas/pdf_search

Folders and files

Latest commit

History

Repository files navigation

PDF Search

Features

Requirements

Getting Started

Usage

generate_db.py

app.py

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages