Skip to content

Commit

Permalink
Merge pull request #1 from salgadev/merge
Browse files Browse the repository at this point in the history
update README
  • Loading branch information
salgadev authored Apr 19, 2024
2 parents 2755970 + 8c7984b commit e4c4b43
Showing 1 changed file with 41 additions and 68 deletions.
109 changes: 41 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@

---
title: DocVerifyRAG
emoji: 🐠
emoji: 🖺
colorFrom: pink
colorTo: green
sdk: streamlit
Expand All @@ -11,125 +10,99 @@ pinned: false
---

<!-- PROJECT TITLE -->
<h1 align="center">DocVerifyRAG: Document Verification and Anomaly Detection</h1>
<h1 align="center">DocVerifyRAG: Anomaly detection for BIM document metadata</h1>
<div id="header" align="center">
</div>
<h2 align="center">
Description
</h2>
<p align="center"> DocVerifyRAG is a revolutionary tool designed to streamline document verification processes in hospitals. It utilizes AI to classify documents and identify mistakes in metadata, ensuring accurate and efficient document management. Inspired by the need for improved data accuracy in healthcare, DocVerifyRAG provides automated anomaly detection to identify misclassifications and errors in document metadata, enhancing data integrity and compliance with regulatory standards. </p>
<p align="center"> Introducing DocVerifyRAG, a cutting-edge solution revolutionizing document verification processes across various sectors. Our app goes beyond mere document classification; it focuses on ensuring metadata accuracy by cross-referencing against a vast vector database of exemplary cases. Inspired by the necessity for precise data management, DocVerifyRAG leverages AI to scrutinize document metadata, instantly flagging anomalies and offering suggested corrections. Powered by Vectara vector store technology and supported by the innovative capabilities of together.ai API, our app employs advanced anomaly detection algorithms to scrutinize metadata, ensuring compliance with regulatory standards and enhancing data integrity. With DocVerifyRAG, users can effortlessly verify document metadata accuracy, minimizing errors and streamlining operational efficiency.</p>

## Table of Contents

<details>
<summary>DocVerifyRAG</summary>

- [Application Description](#application-description)
- [Table of Contents](#table-of-contents)
- [Local installation](#install-locally)
- [Install using Docker](#install-using-docker)
- [Usage](#usage)
- [Contributing](#contributing)
- [TRY the prototype](#try-the-prototype)
- [Screenshots](#screenshots)
- [Technology Stack](#technology-stack)
- [Features](#features)
- [Install locally](#install-locally)
- [Usage](#usage)
- [Authors](#authors)
- [License](#license)

</details>

## TRY the prototype
[DocVerifyRAG](https://docverify-rag.vercel.app)
[DocVerifyRAG](https://docverifyrag.vercel.app/)

## Screenshots

[Add screenshots here]
![ttthh](https://github.com/eliawaefler/DocVerifyRAG/assets/19821445/331845d7-a360-4315-92ef-d4bb50021eaa)

## Technology Stack
| Technology | Description |
| --- | --- |
| **Python** | Primary programming language used for development. |
| **LangChain** | Framework for developing applications powered by large language models (LLMs). |
| **Vectara** | Provides efficient vector search capabilities via the Boomerang model in a "RAG as a service" architecture. |
| **intfloat/multilingual-e5-large** | Generates efficient and performant multilingual language embeddings. |
| **Together AI** | Platform for training, fine-tuning, and deploying gen AI models. Its inference API was used with the model `mistralai/Mixtral-8x7B-Instruct-v0.1`. |
| **Streamlit** | Open-source Python library for creating custom web apps, used as the frontend. |
| **Hugging Face Spaces** | Service for developer-friendly deployments of data applications. |

The backend is built using Python, LangChain, Vectara, and Together AI's inference API with the `mistralai/Mixtral-8x7B-Instruct-v0.1` model for processing and understanding large amounts of data. Streamlit is used for the frontend, providing an intuitive interface for users. Hugging Face Spaces simplifies the deployment process, making the application easily accessible.

| Technology | Description |
| ---------- | --------------------------- |
| AI/ML | Artificial Intelligence and Machine Learning |
| Python | Programming Language |
| Flask | Web Framework |
| Docker | Containerization |
| Tech Name | Short description |

### Features

1. **Document Classification:**
- Utilizes AI/ML algorithms to classify documents based on content and metadata.
- Provides accurate and efficient document categorization for improved data management.
1. **Metadata Verification:**
- Cross-references document metadata against a comprehensive vector database of exemplary cases.
- Instantly identifies anomalies and discrepancies, ensuring metadata accuracy and compliance.

2. **Anomaly Detection:**
- Identifies mistakes and misclassifications in document metadata through automated anomaly detection.
- Enhances data integrity and accuracy by flagging discrepancies in document metadata.
2. **Automated Metadata Correction:**
- Offers suggested metadata corrections based on processed PDF files, facilitating swift and accurate adjustments.
- Potential for automated inspection of numerous metadata rows for seamless large-scale data verification.

3. **User-Friendly Interface:**
- Offers a user-friendly web interface for easy document upload, classification, and verification.
- Simplifies the document management process for hospital staff, reducing manual effort and errors.
3. **Question Answering Retriever:**
- Utilizes Vectara vector store technology for efficient retrieval of relevant information.
- Employs Hugging Face embeddings E5 multilingual model for precise analysis of multilingual data.
- Identifies anomalies in names, descriptions, and disciplines, providing actionable insights for data accuracy.

### Install locally
4. **User-Friendly Interface:**
- Intuitive web interface for effortless document upload, metadata verification, and correction.
- Simplifies document management processes, reducing manual effort and enhancing operational efficiency.

#### Step 1 - Frontend
### Install locally

1. Clone the repository:
```bash
$ git clone https://github.com/eliawaefler/DocVerifyRAG.git
```

2. Navigate to the frontend directory:
```bash
$ cd DocVerifyRAG/frontend
```

3. Install dependencies:
```bash
$ npm install
```
4. Run project:
```bash
$ npm run dev
```

#### Step 2 - Backend

1. Navigate to the backend directory:
```bash
$ cd DocVerifyRAG/backend
$ git clone https://github.com/salgadev/DocVerifyRAG.git
```

2. Install dependencies:
```bash
$ pip install -r requirements.txt
```

### Install using Docker

To deploy DocVerifyRAG using Docker, follow these steps:

1. Pull the Docker image from Docker Hub:

```bash
$ docker pull sandra/docverifyrag:latest
```

2. Run the Docker container:

3. Run using Streamlit:
```bash
$ docker run -d -p 5000:5000 sandramsc/docverifyrag:latest
$ streamlit run app.py
```

### Usage

Access the web interface and follow the prompts to upload documents, classify them, and verify metadata. The AI-powered anomaly detection system will automatically flag any discrepancies or errors in the document metadata, providing accurate and reliable document management solutions for hospitals.
Access the web interface and follow the prompts to upload documents, classify them, and verify metadata. The AI-powered anomaly detection system will automatically flag any discrepancies or errors in the document metadata, providing accurate and reliable document management solutions.
## Authors

| Name | Link |
| -------------- | ----------------------------------------- |
| Sandra Ashipala | [GitHub](https://github.com/sandramsc) |
| Elia Wäfler | [GitHub](https://github.com/eliawaefler) |
| Carlos Salgado | [GitHub](https://github.com/salgadev) |
| Abdul Qadeer | [GitHub](https://github.com/AbdulQadeer-55) |


## License

[![GitLicense](https://img.shields.io/badge/License-MIT-lime.svg)](https://github.com/eliawaefler/DocVerifyRAG/blob/main/LICENSE)
[![GitLicense](https://img.shields.io/badge/License-MIT-lime.svg)](https://github.com/eliawaefler/DocVerifyRAG/blob/main/LICENSE)

0 comments on commit e4c4b43

Please sign in to comment.