Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Frechet Inception Distance (FID) Score #556

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions metrics/fid_score/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
title: fid_score
emoji: 🤗
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
The Frechet Inception Distance (FID) is a metric used to evaluate the quality of generated images from generative adversarial networks (GANs). It measures the similarity between the feature representations of real and generated images.

FID is calculated by first extracting feature vectors from a pre-trained Inception-v3 network for both the real and generated images. Then, it computes the mean and covariance matrix of these feature vectors for each set. Finally, it calculates the Fréchet distance between these multivariate Gaussian distributions.

A lower FID score indicates a higher similarity between the real and generated images, suggesting better performance of the GAN in generating realistic images.

For further details, please refer to the paper:
"GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium"
by Heusel et al., presented at the Advances in Neural Information Processing Systems (NeurIPS) conference in 2017.
---

# Metric Card for fid_score

## Metric description

The Frechet Inception Distance (FID) is a metric used to evaluate the quality of generated images from generative adversarial networks (GANs). It measures the similarity between the feature representations of real and generated images.

FID is calculated by first extracting feature vectors from a pre-trained Inception-v3 network for both the real and generated images. Then, it computes the mean and covariance matrix of these feature vectors for each set. Finally, it calculates the Fréchet distance between these multivariate Gaussian distributions.

A lower FID score indicates a higher similarity between the real and generated images, suggesting better performance of the GAN in generating realistic images.


## How to use

The metric takes two inputs: references (a list of references for each speech input) and predictions (a list of transcriptions to score).

```python
from evaluate import load
fid = load("fid_score")
fid_score = fid.compute(real_images=real_images, generated_images=generated_images)
```
## Output values

This metric outputs a float which is a single floating-point number representing the dissimilarity between the distributions of real and generated images.
```
print(fid_score)
73.65338799020432
```

The **lower** the fid_score value, the **better** the performance of the GAN.

### Values from popular papers
## Examples

Perfect match between two same GANs images:

```python
from evaluate import load
fid = load("fid_score")
im1 = cv2.imread("gans1.png")
im2 = cv2.imread("gans1.png")
fid_score = fid.compute(real_images=im1, generated_images=im2)
print(fid_score)
5.020410753786564e-10
```

Partial match between two GANs images:

```python
from evaluate import load
fid = load("fid_score")
im1 = cv2.imread("gans1.png")
im2 = cv2.imread("gans2.png")
fid_score = fid.compute(real_images=im1, generated_images=im2)
print(fid_score)
73.65338799020432
```

No match between two random images:

```python
from evaluate import load
fid = load("fid_score")
im1 = cv2.imread("gans1.png")
im2 = cv2.imread("random.png")
fid_score = fid.compute(real_images=im1, generated_images=im2)
print(fid_score)
10991.947362765806
```




## Limitations and bias

The Frechet Inception Distance (FID) metric, while widely used for assessing the quality of generated images, has limitations and biases. It relies on features extracted from a pretrained Inception network, making it sensitive to network biases and changes in architecture or training procedures. FID may not capture all aspects of image quality, especially semantic differences or specific image features, and its interpretation lacks detailed insights into image generation. Additionally, FID is computationally complex and lacks a universally agreed-upon threshold for acceptable scores, requiring careful consideration alongside qualitative evaluation methods. Despite these limitations, FID remains valuable for comparing generative models, particularly in the context of GANs, but should be used judiciously alongside other evaluation metrics.


## Citation


```bibtex
@inproceedings{heusel2017gans,
title={GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium},
author={Heusel, Martin and Ramsauer, Hubert and Unterthiner, Thomas and Nessler, Bernhard and Hochreiter, Sepp},
booktitle={Advances in Neural Information Processing Systems},
pages={6626--6637},
year={2017}
}
```

## Further References

- [Fréchet inception distance -- Wikipedia](https://en.wikipedia.org/wiki/Fr%C3%A9chet_inception_distance)
6 changes: 6 additions & 0 deletions metrics/fid_score/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
import evaluate
from evaluate.utils import launch_gradio_widget


module = evaluate.load("fid_score")
launch_gradio_widget(module)
85 changes: 85 additions & 0 deletions metrics/fid_score/fid_score.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Copyright 2024 The HuggingFace Datasets Authors and the current dataset script contributor.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""fid score metric."""
import numpy as np
from scipy.linalg import sqrtm

import evaluate


_CITATION = """\
@inproceedings{heusel2017gans,
title={GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium},
author={Heusel, Martin and Ramsauer, Hubert and Unterthiner, Thomas and Nessler, Bernhard and Hochreiter, Sepp},
booktitle={Advances in Neural Information Processing Systems},
pages={6626--6637},
year={2017}
}
"""

_DESCRIPTION = """\
The Frechet Inception Distance (FID) is a metric used to evaluate the quality of generated images from generative adversarial networks (GANs). It measures the similarity between the feature representations of real and generated images.

FID is calculated by first extracting feature vectors from a pre-trained Inception-v3 network for both the real and generated images. Then, it computes the mean and covariance matrix of these feature vectors for each set. Finally, it calculates the Fréchet distance between these multivariate Gaussian distributions.

A lower FID score indicates a higher similarity between the real and generated images, suggesting better performance of the GAN in generating realistic images.

For further details, please refer to the paper:
"GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium"
by Heusel et al., presented at the Advances in Neural Information Processing Systems (NeurIPS) conference in 2017.
"""

_KWARGS_DESCRIPTION = """
Computes FID score between two sets of features.
Args:
real_features: numpy array of feature vectors extracted from real images.
fake_features: numpy array of feature vectors extracted from generated images.
Returns:
(float): the Frechet Inception Distance (FID) score.
"""


@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
class FID(evaluate.Metric):
def _info(self):
return evaluate.MetricInfo(
description=_DESCRIPTION,
citation=_CITATION,
inputs_description=_KWARGS_DESCRIPTION,
features=None,
reference_urls=[],
)

def _compute(self, real_images, generated_images):
real_images = real_images.reshape(real_images.shape[0] * real_images.shape[1], real_images.shape[2])
generated_images = generated_images.reshape(
generated_images.shape[0] * generated_images.shape[1], generated_images.shape[2]
)
mu_real = np.mean(real_images, axis=0)
sigma_real = np.cov(real_images, rowvar=False)

mu_generated = np.mean(generated_images, axis=0)
sigma_generated = np.cov(generated_images, rowvar=False)

mean_diff = mu_real - mu_generated
mean_diff_squared = np.dot(mean_diff, mean_diff)

cov_mean = sqrtm(sigma_real.dot(sigma_generated))

if np.iscomplexobj(cov_mean):
cov_mean = cov_mean.real

fid = mean_diff_squared + np.trace(sigma_real + sigma_generated - 2 * cov_mean)
return fid
2 changes: 2 additions & 0 deletions metrics/fid_score/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
git+https://github.com/huggingface/evaluate@{COMMIT_PLACEHOLDER}
scipy