v0.10.0 release
What's Changed
Feat: Add EvalScope dashboard by @Yunnglin in #277
- Including single-model evaluation results and multi-model comparison, refer to the 📖 Visualizing Evaluation Results for more details
Others
- Add
model-id
in arguments by @Yunnglin in #274 - Add
ifeval
and unify report format by @Yunnglin in #275 - Add
iquiz
and use first metric by default for multi metrics by @Yunnglin in #288 - Support specifying system prompt by @Yunnglin in #283
- Bug-fix multi-metrics dataset by @Yunnglin in #282
- Bug-fix mmlu read local data by @Yunnglin in #273
功能更新
主要更新
其他
- 在参数中添加
model-id
,由 @Yunnglin 在 #274 中实现 - 添加
ifeval
评测基准;并统一报告格式,由 @Yunnglin 在 #275 中实现 - 添加
iquiz
评测基准;支持多指标的评测集在展示结果时默认使用第一个指标的结果,由 @Yunnglin 在 #288 中实现 - 支持指定system prompt,由 @Yunnglin 在 #283 中实现
- 修复多指标数据集的错误,由 @Yunnglin 在 #282 中实现
- 修复 mmlu 读取本地数据的问题,由 @Yunnglin 在 #273 中实现
Full Changelog: v0.9.0...v0.10.0