Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of SummEval Metric to evaluate Library #599

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "metrics/summeval"]
path = metrics/summeval
url = https://github.com/penguinwang96825/evaluate
35 changes: 35 additions & 0 deletions metrics/summeval/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
53 changes: 53 additions & 0 deletions metrics/summeval/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
title: Summeval
emoji: 🌍
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 4.36.0
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
The SummEval dataset is a resource developed by the Yale LILY Lab and Salesforce Research for evaluating text summarization models.
It was created as part of a project to address shortcomings in summarization evaluation methods.
---

# Metric Card for SummEval Score

## Metric Description

The SummEval dataset is a resource developed by the Yale LILY Lab and Salesforce Research for evaluating text summarization models.
It was created as part of a project to address shortcomings in summarization evaluation methods.

## How to Use

1. **Loading the relevant SummEval metric** : the subsets of SummEval are the following: `rouge`, `rouge-we`, `mover-score`, `bert-score`, `summa-qa`, `blanc`, `supert`, `meteor`, `s3`, `data-stats`, `cider`, `chrf`, `bleu`, `syntactic`.

2. **Calculating the metric**: the metric takes two inputs : one list with the predictions of the model to score and one lists of references.

```python
from evaluate import load

summeval_metric = load('summeval', 'rouge')
predictions = ["hello there", "general kenobi"]
references = ["hello there", "general kenobi"]
results = summeval_metric.compute(predictions=predictions, references=references)
```

## Limitations and Bias

SummEval, like other evaluation frameworks, faces limitations such as reliance on possibly biased reference summaries, which can affect the accuracy of assessments. Automated metrics predominantly measure surface similarities, often missing deeper textual nuances like coherence and factual accuracy. Human evaluations introduce subjective biases and inconsistencies, while the framework’s effectiveness may also be limited by its focus on specific languages or domains. Moreover, the scalability challenges due to resource-intensive human evaluations can limit its broader applicability, particularly for those with limited resources.

## Citation

```bibtex
@article{fabbri2020summeval,
title={SummEval: Re-evaluating Summarization Evaluation},
author={Fabbri, Alexander R and Kry{\'s}ci{\'n}ski, Wojciech and McCann, Bryan and Xiong, Caiming and Socher, Richard and Radev, Dragomir},
journal={arXiv preprint arXiv:2007.12626},
year={2020}
}
```
6 changes: 6 additions & 0 deletions metrics/summeval/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
import evaluate
from evaluate.utils import launch_gradio_widget


module = evaluate.load("summeval")
launch_gradio_widget(module)
1 change: 1 addition & 0 deletions metrics/summeval/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
summ-eval==0.892
Loading