v0.10.0 release

Yunnglin released this 20 Jan 11:05

· 7 commits to release/0.10 since this release

73e6b5c

What's Changed

Feat: Add EvalScope dashboard by @Yunnglin in #277

Including single-model evaluation results and multi-model comparison, refer to the 📖 Visualizing Evaluation Results for more details

Others

Add model-id in arguments by @Yunnglin in #274
Add ifeval and unify report format by @Yunnglin in #275
Add iquiz and use first metric by default for multi metrics by @Yunnglin in #288
Support specifying system prompt by @Yunnglin in #283
Bug-fix multi-metrics dataset by @Yunnglin in #282
Bug-fix mmlu read local data by @Yunnglin in #273

功能更新

主要更新

添加评测报告可视化，由 @Yunnglin 在 #277 中实现
- 包括单模型评估结果和多模型对比，更多详情请参考 📖 可视化评估结果

其他

在参数中添加 model-id，由 @Yunnglin 在 #274 中实现
添加 ifeval 评测基准；并统一报告格式，由 @Yunnglin 在 #275 中实现
添加 iquiz评测基准；支持多指标的评测集在展示结果时默认使用第一个指标的结果，由 @Yunnglin 在 #288 中实现
支持指定system prompt，由 @Yunnglin 在 #283 中实现
修复多指标数据集的错误，由 @Yunnglin 在 #282 中实现
修复 mmlu 读取本地数据的问题，由 @Yunnglin 在 #273 中实现

Full Changelog: v0.9.0...v0.10.0

Contributors

Yunnglin

Assets 2