This project is a Flask-based web application designed to predict the relevancy of documents to a given query. It processes Excel files containing document data, analyzes the content using advanced NLP techniques, and updates the file with relevancy predictions and comments.
- Overview
- Features
- Setup and Installation
- Project Structure
- How It Works
- Usage
- Dependencies
- API Endpoints
- Contributing
- License
This application leverages Flask for the web server, OpenAI for natural language processing, and Openpyxl for Excel file handling. It reads document data from an Excel file, processes each document to determine its relevancy to a specified query, and updates the file with the relevancy status and comments.
- Extracts document data from Excel files.
- Uses OpenAI's language model for relevancy prediction.
- Supports custom queries for relevancy assessment.
- Updates the original Excel file with prediction results and comments.
- Provides an easy-to-use web interface for file processing.
- Python 3.7 or higher
- Pip (Python package installer)
-
Clone the Repository:
git clone https://github.com/yourusername/flask-relevancy-prediction.git cd flask-relevancy-prediction
-
Create a Virtual Environment:
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Dependencies:
pip install -r requirements.txt
-
Set Up Environment Variables: Create a
.env
file in the root directory with the following content:LLAMA_CLOUD_API_KEY=your_openai_api_key
-
Run the Application:
python app.py
-
Access the Application: Open your web browser and go to
http://localhost:5000
.
.
├── app.py # Main application file
├── requirements.txt # Python dependencies
├── .env # Environment variables
└── README.md # Project documentation
-
Flask Application:
- Hosts the web interface and handles HTTP requests.
- Defines routes for file processing.
-
Excel Data Extraction:
- Uses
openpyxl
to read document data from Excel files. - Extracts specific columns for analysis.
- Uses
-
Relevancy Prediction:
- Leverages
llama_index
and OpenAI's language model for predicting document relevancy to a query. - Processes the content and generates relevancy status and comments.
- Leverages
-
File Update:
- Adds relevancy results and comments to new columns in the original Excel file.
- Saves the updated file.
-
Flask Application Setup:
- The application is initialized using Flask.
- The route
/
is defined to handle file upload and processing requests via POST method.
-
Data Extraction:
- The
extractor
function reads an Excel file usingopenpyxl
. - It retrieves data from specific columns and stores them in a list of dictionaries for further processing.
- The
-
Relevancy Prediction:
- The
backend
function processes each document using a custom Query Engine based on OpenAI's GPT-3.5. - The engine predicts whether the document is relevant to the given query.
- It returns a tuple with a relevancy flag and a reason for the prediction.
- The
-
File Update:
- The
newFileSaver
function writes the relevancy results back to the Excel file. - It creates new columns for storing the relevancy status and comments.
- The updated file is saved and its path is returned.
- The
-
API Endpoint:
- The
/
endpoint processes incoming JSON data containing the query and file path. - It calls the relevant functions to extract data, predict relevancy, and save results.
- Returns the path to the updated file as a JSON response.
- The
-
Upload and Process File:
- Send a POST request to the root endpoint with JSON data containing
query
andfile_path
.
Example JSON:
{ "query": "Is Mannuronic acid used in cosmetic formulations?", "file_path": "path/to/excel_file.xlsx" }
- Send a POST request to the root endpoint with JSON data containing
-
View Results:
- Check the returned file path in the JSON response to view the updated Excel file.
- Flask: Web framework for Python.
- openpyxl: Library for reading and writing Excel files.
- dotenv: Library for managing environment variables.
- llama_index: Custom library for indexing and querying document data.
- OpenAI: Python library for interacting with OpenAI's GPT-3.5.
/
: Main endpoint to process file and query.- Method: POST
- Request Body:
query
: The user query string.file_path
: Path to the Excel file.
- Response:
Path
: Updated file path.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch.
- Make your changes.
- Submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.