huggingface · medahmedkrichen · Mar 2, 2024
diff --git a/metrics/fid_score/README.md b/metrics/fid_score/README.md
@@ -0,0 +1,117 @@
+---
+title: fid_score
+emoji: 🤗 
+colorFrom: blue
+colorTo: red
+sdk: gradio
+sdk_version: 3.19.1
+app_file: app.py
+pinned: false
+tags:
+- evaluate
+- metric
+description: >-
+The Frechet Inception Distance (FID) is a metric used to evaluate the quality of generated images from generative adversarial networks (GANs). It measures the similarity between the feature representations of real and generated images.
+
+FID is calculated by first extracting feature vectors from a pre-trained Inception-v3 network for both the real and generated images. Then, it computes the mean and covariance matrix of these feature vectors for each set. Finally, it calculates the Fréchet distance between these multivariate Gaussian distributions.
+
+A lower FID score indicates a higher similarity between the real and generated images, suggesting better performance of the GAN in generating realistic images.
+
+For further details, please refer to the paper:
+"GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium"
+by Heusel et al., presented at the Advances in Neural Information Processing Systems (NeurIPS) conference in 2017.
+---
+
+# Metric Card for fid_score
+
+## Metric description
+
+The Frechet Inception Distance (FID) is a metric used to evaluate the quality of generated images from generative adversarial networks (GANs). It measures the similarity between the feature representations of real and generated images.
+
+FID is calculated by first extracting feature vectors from a pre-trained Inception-v3 network for both the real and generated images. Then, it computes the mean and covariance matrix of these feature vectors for each set. Finally, it calculates the Fréchet distance between these multivariate Gaussian distributions.
+
+A lower FID score indicates a higher similarity between the real and generated images, suggesting better performance of the GAN in generating realistic images.
+
+
+## How to use 
+
+The metric takes two inputs: references (a list of references for each speech input) and predictions (a list of transcriptions to score).
+
+```python
+from evaluate import load
+fid = load("fid_score")
+fid_score = fid.compute(real_images=real_images, generated_images=generated_images)
+```
+## Output values
+
+This metric outputs a float which is a single floating-point number representing the dissimilarity between the distributions of real and generated images.
+```
+print(fid_score)
+73.65338799020432
+```
+
+The **lower** the fid_score value, the **better** the performance of the GAN.
+
+### Values from popular papers
+## Examples 
+
+Perfect match between two same GANs images:
+
+```python
+from evaluate import load
+fid = load("fid_score")
+im1 = cv2.imread("gans1.png")
+im2 = cv2.imread("gans1.png")
+fid_score = fid.compute(real_images=im1, generated_images=im2)
+print(fid_score)
+5.020410753786564e-10
+```
+
+Partial match between two GANs images:
+
+```python
+from evaluate import load
+fid = load("fid_score")
+im1 = cv2.imread("gans1.png")
+im2 = cv2.imread("gans2.png")
+fid_score = fid.compute(real_images=im1, generated_images=im2)
+print(fid_score)
+73.65338799020432
+```
+
+No match between two random images:
+
+```python
+from evaluate import load
+fid = load("fid_score")
+im1 = cv2.imread("gans1.png")
+im2 = cv2.imread("random.png")
+fid_score = fid.compute(real_images=im1, generated_images=im2)
+print(fid_score)
+10991.947362765806
+```
+
+
+
+
+## Limitations and bias
+
+The Frechet Inception Distance (FID) metric, while widely used for assessing the quality of generated images, has limitations and biases. It relies on features extracted from a pretrained Inception network, making it sensitive to network biases and changes in architecture or training procedures. FID may not capture all aspects of image quality, especially semantic differences or specific image features, and its interpretation lacks detailed insights into image generation. Additionally, FID is computationally complex and lacks a universally agreed-upon threshold for acceptable scores, requiring careful consideration alongside qualitative evaluation methods. Despite these limitations, FID remains valuable for comparing generative models, particularly in the context of GANs, but should be used judiciously alongside other evaluation metrics.
+
+
+## Citation
+
+
+```bibtex
+@inproceedings{heusel2017gans,
+  title={GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium},
+  author={Heusel, Martin and Ramsauer, Hubert and Unterthiner, Thomas and Nessler, Bernhard and Hochreiter, Sepp},
+  booktitle={Advances in Neural Information Processing Systems},
+  pages={6626--6637},
+  year={2017}
+}
+```
+
+## Further References 
+
+- [Fréchet inception distance -- Wikipedia](https://en.wikipedia.org/wiki/Fr%C3%A9chet_inception_distance)
diff --git a/metrics/fid_score/app.py b/metrics/fid_score/app.py
@@ -0,0 +1,6 @@
+import evaluate
+from evaluate.utils import launch_gradio_widget
+
+
+module = evaluate.load("fid_score")
+launch_gradio_widget(module)
diff --git a/metrics/fid_score/fid_score.py b/metrics/fid_score/fid_score.py
@@ -0,0 +1,85 @@
+# Copyright 2024 The HuggingFace Datasets Authors and the current dataset script contributor.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""fid score metric."""
+import numpy as np
+from scipy.linalg import sqrtm
+
+import evaluate
+
+
+_CITATION = """\
+@inproceedings{heusel2017gans,
+  title={GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium},
+  author={Heusel, Martin and Ramsauer, Hubert and Unterthiner, Thomas and Nessler, Bernhard and Hochreiter, Sepp},
+  booktitle={Advances in Neural Information Processing Systems},
+  pages={6626--6637},
+  year={2017}
+}
+"""
+
+_DESCRIPTION = """\
+The Frechet Inception Distance (FID) is a metric used to evaluate the quality of generated images from generative adversarial networks (GANs). It measures the similarity between the feature representations of real and generated images.
+
+FID is calculated by first extracting feature vectors from a pre-trained Inception-v3 network for both the real and generated images. Then, it computes the mean and covariance matrix of these feature vectors for each set. Finally, it calculates the Fréchet distance between these multivariate Gaussian distributions.
+
+A lower FID score indicates a higher similarity between the real and generated images, suggesting better performance of the GAN in generating realistic images.
+
+For further details, please refer to the paper:
+"GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium"
+by Heusel et al., presented at the Advances in Neural Information Processing Systems (NeurIPS) conference in 2017.
+"""
+
+_KWARGS_DESCRIPTION = """
+Computes FID score between two sets of features.
+Args:
+    real_features: numpy array of feature vectors extracted from real images.
+    fake_features: numpy array of feature vectors extracted from generated images.
+Returns:
+    (float): the Frechet Inception Distance (FID) score.
+"""
+
+
+@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
+class FID(evaluate.Metric):
+    def _info(self):
+        return evaluate.MetricInfo(
+            description=_DESCRIPTION,
+            citation=_CITATION,
+            inputs_description=_KWARGS_DESCRIPTION,
+            features=None,
+            reference_urls=[],
+        )
+
+    def _compute(self, real_images, generated_images):
+        real_images = real_images.reshape(real_images.shape[0] * real_images.shape[1], real_images.shape[2])
+        generated_images = generated_images.reshape(
+            generated_images.shape[0] * generated_images.shape[1], generated_images.shape[2]
+        )
+        mu_real = np.mean(real_images, axis=0)
+        sigma_real = np.cov(real_images, rowvar=False)
+
+        mu_generated = np.mean(generated_images, axis=0)
+        sigma_generated = np.cov(generated_images, rowvar=False)
+
+        mean_diff = mu_real - mu_generated
+        mean_diff_squared = np.dot(mean_diff, mean_diff)
+
+        cov_mean = sqrtm(sigma_real.dot(sigma_generated))
+
+        if np.iscomplexobj(cov_mean):
+            cov_mean = cov_mean.real
+
+        fid = mean_diff_squared + np.trace(sigma_real + sigma_generated - 2 * cov_mean)
+        return fid
diff --git a/metrics/fid_score/requirements.txt b/metrics/fid_score/requirements.txt
@@ -0,0 +1,2 @@
+git+https://github.com/huggingface/evaluate@{COMMIT_PLACEHOLDER}
+scipy