Skip to content

Commit

Permalink
[wwb] typo from README.md (#1072)
Browse files Browse the repository at this point in the history
  • Loading branch information
isanghao authored Oct 25, 2024
1 parent 6846052 commit 836145a
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions tools/who_what_benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,10 +107,10 @@ wwb --target-model sd-lcm-int8 --gt-data lcm_test/sd_xl.json --model-type text-t
### Supported metrics

* `similarity` - averaged similarity measured by neural network trained for sentence embeddings. The best is 1.0, the minimum is 0.0, higher-better.
* `FDT` - Average position of the first divergent token between sentences generated by differnrt LLMs. The worst is 0, higher-better. [Paper.](https://arxiv.org/abs/2311.01544)
* `FDT norm` - Average share of matched tokens until first divergent one between sentences generated by differnrt LLMs. The best is 1, higher-better.[Paper.](https://arxiv.org/abs/2311.01544)
* `SDT` - Average number of divergent tokens in the evaluated outputs between sentences generated by differnrt LLMs. The best is 0, lower-better. [Paper.](https://arxiv.org/abs/2311.01544)
* `SDT norm` - Average share of divergent tokens in the evaluated outputs between sentences generated by differnrt LLMs. The best is 0, the maximum is 1, lower-better. [Paper.](https://arxiv.org/abs/2311.01544)
* `FDT` - Average position of the first divergent token between sentences generated by different LLMs. The worst is 0, higher-better. [Paper.](https://arxiv.org/abs/2311.01544)
* `FDT norm` - Average share of matched tokens until first divergent one between sentences generated by different LLMs. The best is 1, higher-better.[Paper.](https://arxiv.org/abs/2311.01544)
* `SDT` - Average number of divergent tokens in the evaluated outputs between sentences generated by different LLMs. The best is 0, lower-better. [Paper.](https://arxiv.org/abs/2311.01544)
* `SDT norm` - Average share of divergent tokens in the evaluated outputs between sentences generated by different LLMs. The best is 0, the maximum is 1, lower-better. [Paper.](https://arxiv.org/abs/2311.01544)

### Notes

Expand Down

0 comments on commit 836145a

Please sign in to comment.