Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document cleaning with LLM #13

Open
stefanfrench opened this issue Nov 21, 2024 · 0 comments
Open

Document cleaning with LLM #13

stefanfrench opened this issue Nov 21, 2024 · 0 comments
Labels
data cleaning Related to the data cleaning module enhancement New feature or request help wanted Extra attention is needed

Comments

@stefanfrench
Copy link
Contributor

Developing a component which uses an LLM to clean up the uploaded document, thereby replacing/improving the traditional document pre-processing component.

Acceptance criteria:

  • Outputs are of high-quality
  • Runs with a small language model (small enough for 16GB RAM)

UAT

As a developer, I can successfully upload a variety of document formats (PDF, DOCX, TXT) and see the extracted text displayed without errors

@stefanfrench stefanfrench added enhancement New feature or request help wanted Extra attention is needed labels Nov 21, 2024
@daavoo daavoo added the data cleaning Related to the data cleaning module label Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data cleaning Related to the data cleaning module enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants