- Fix completion cost computation to support more models with Litellm like vertex_ai models
- Fix API Connection Errors with Semaphore
- Added o1 support
- Added meta evaluation pipeline to Python package
- Change jinja template extensions
- Fixed prompts newlines
- Removed latest structured generation because faithfulness and usefulness dtos did not support it
- Fixed dataset loading in plot function
- Add flag to use the training dataset in meta evaluation.
- Add missing model register file.
- Register Fireworks Llama 3.1 8b and 70b prices with litellm to better support these models as evaluators.
- Remove
instructor
package to better understand what is really sent to the LLM. All the LLM generations are simply done usinglitellm.acompletion
. - Removed black from justfile and dev dependencies as ruff plays the same role.
- Created
GroundedQAEvaluator
that evaluates four metrics per sample: answer relevancy, completeness, faithfulness, usefulness, negative rejection and positive acceptance. - Created
MetaEvaluator
to evaluate evaluators on GroUSE unit tests.