Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: LeapfrogAI Evaluations v1.1 #1171

Open
6 tasks
jalling97 opened this issue Oct 1, 2024 · 0 comments
Open
6 tasks

EPIC: LeapfrogAI Evaluations v1.1 #1171

jalling97 opened this issue Oct 1, 2024 · 0 comments
Assignees
Labels
EPIC ⚔️ EPIC issue to consolidate several sub-issues

Comments

@jalling97
Copy link
Contributor

jalling97 commented Oct 1, 2024

LeapfrogAI Evaluations v1.1

Description

Now that a baseline evaluations framework for LeapfrogAI exists, it needs to be further expanded to meet the needs of the product and mission-success teams.

Feedback has been provided with the common themes needing to be addressed:

  • Some evaluations (primarily NIAH) are always passing 100% and as such, are not helpful for tracking growth over time
  • Some NIAH and QA evals are not leveraging the full chunk data in RAG responses and as such are not evaluating RAG to the extent it should be
  • Evaluation results are not currently being stored anywhere
  • The current implementation of LFAI evals is very specific to the OpenAI way of handling RAG, and therefore the evaluations can't be run against custom RAG pipelines (a delivery concern).
  • MMLU results suspiciously sometimes return the same score for multiple topics, indicating a potential problem with the evaluation 🐛

Completion Criteria

@jalling97 jalling97 added the EPIC ⚔️ EPIC issue to consolidate several sub-issues label Oct 1, 2024
@jalling97 jalling97 self-assigned this Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPIC ⚔️ EPIC issue to consolidate several sub-issues
Projects
None yet
Development

No branches or pull requests

1 participant