Merge pull request #21 from Baukebrenninkmeijer/develop

Baukebrenninkmeijer · Dec 3, 2021 · 7896d1b · 7896d1b
2 parents 81b250e + be93859
commit 7896d1b
Show file tree

Hide file tree

Showing 4 changed files with 144 additions and 90 deletions.
diff --git a/README.md b/README.md
@@ -3,8 +3,7 @@
 [![Supported versions](https://img.shields.io/pypi/pyversions/table_evaluator.svg)](https://pypi.python.org/pypi/table_evaluator)
 ![Package deployment](https://github.com/Baukebrenninkmeijer/table-evaluator/actions/workflows/python-publish.yml/badge.svg?branch=master)
 [![PyPI - Downloads](https://img.shields.io/pypi/dm/table_evaluator)](https://pypistats.org/packages/table_evaluator)
-
-[Official documentation](https://baukebrenninkmeijer.github.io/table-evaluator/)
+[![Documentation](https://img.shields.io/badge/Documentation-%20-blue)](https://baukebrenninkmeijer.github.io/table-evaluator/)
 
 TableEvaluator is a library to evaluate how similar a synthesized dataset is to a real data. In other words, it tries to give an indication into how real your fake data is. With the rise of GANs, specifically designed for tabular data, many applications are becoming possibilities. For industries like finance, healthcare and goverments, having the capacity to create high quality synthetic data that does **not** have the privacy constraints of normal data is extremely valuable. Since this field is this quite young and developing, I created this library to have a consistent evaluation method for your models.
 
@@ -19,9 +18,15 @@ The test can be run by cloning the repo and running:
 ```
 pytest tests
 ```
+if this does not work, the package might not currently be findable. In that case, please install it locally with:
+
+```
+pip install -e .
+```
 
 ## Usage
-**Please see the example notebook for the most up-to-date examples. The README example is just that notebook, but sometimes a bit outdated.**
+**Please see the [example notebook](https://github.com/Baukebrenninkmeijer/table-evaluator/blob/master/example_table_evaluator.ipynb) for the most up-to-date examples. The README example is just that notebook as markdown.**
+
 Start by importing the class
 ```Python
 from table_evaluator import load_data, TableEvaluator
@@ -142,6 +147,6 @@ table_evaluator.evaluate(target_col='trans_type')
 Please see the full documentation on [https://baukebrenninkmeijer.github.io/table-evaluator/](https://baukebrenninkmeijer.github.io/table-evaluator/).
 
 ## Motivation
-To see the motivation for my decisions, please have a look at my master thesis, found at [https://www.ru.nl/publish/pages/769526/z04_master_thesis_brenninkmeijer.pdf](https://www.ru.nl/publish/pages/769526/z04_master_thesis_brenninkmeijer.pdf)
+To see the motivation for my decisions, please have a look at my master thesis, found at the [Radboud University](https://www.ru.nl/publish/pages/769526/z04_master_thesis_brenninkmeijer.pdf)
 
 If you have any tips or suggestions, please contact send me on email.
diff --git a/example_table_evaluator.ipynb b/example_table_evaluator.ipynb
@@ -10,17 +10,26 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 7,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The autoreload extension is already loaded. To reload it, use:\n",
+      "  %reload_ext autoreload\n"
+     ]
+    }
+   ],
    "source": [
     "%load_ext autoreload\n",
     "%autoreload 2"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -29,7 +38,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -38,7 +47,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [
     {
@@ -148,7 +157,7 @@
        "4          WITHDRAWAL_IN_CASH            UNKNOWN         654  "
       ]
      },
-     "execution_count": 4,
+     "execution_count": 10,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -159,7 +168,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 11,
    "metadata": {},
    "outputs": [
     {
@@ -269,7 +278,7 @@
        "4    REMITTANCE_TO_OTHER_BANK      HOUSEHOLD        1211  "
       ]
      },
-     "execution_count": 5,
+     "execution_count": 11,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -280,7 +289,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 12,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -289,7 +298,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 44,
+   "execution_count": 13,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -305,49 +314,38 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 45,
+   "execution_count": 24,
    "metadata": {},
    "outputs": [
     {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "Classifier F1-scores and their Jaccard similarities::\n",
-      "                                     f1_real  f1_fake  jaccard_similarity\n",
-      "index                                                                    \n",
-      "LogisticRegression_real_testset       0.7800   0.7750              0.9704\n",
-      "LogisticRegression_fake_testset       0.7550   0.7450              0.9048\n",
-      "RandomForestClassifier_real_testset   0.9850   0.9850              1.0000\n",
-      "RandomForestClassifier_fake_testset   0.9650   0.9650              1.0000\n",
-      "DecisionTreeClassifier_real_testset   0.9800   0.9650              0.9512\n",
-      "DecisionTreeClassifier_fake_testset   0.9600   0.9150              0.9139\n",
-      "MLPClassifier_real_testset            0.4000   0.5000              0.5326\n",
-      "MLPClassifier_fake_testset            0.4300   0.5450              0.4925\n",
-      "\n",
-      "Privacy results:\n",
-      "                                         result\n",
-      "Duplicate rows between sets (real/fake)  (0, 0)\n",
-      "nearest neighbor mean                    0.5655\n",
-      "nearest neighbor std                     0.3726\n",
-      "\n",
-      "Miscellaneous results:\n",
-      "                                  Result\n",
-      "Column Correlation Distance RMSE  0.0399\n",
-      "Column Correlation distance MAE   0.0296\n",
-      "\n",
-      "Results:\n",
-      "                                                result\n",
-      "Basic statistics                                0.9940\n",
-      "Correlation column correlations                 0.9904\n",
-      "Mean Correlation between fake and real columns  0.9566\n",
-      "1 - MAPE Estimator results                      0.9251\n",
-      "Similarity Score                                0.9665\n"
-     ]
+     "data": {
+      "text/html": [
+       "<h1 style=\"text-align: center\">Synthetic Data Report</h1>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "5674247a319f428a96b21a6d4b2dc626",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Tab(children=(VBox(children=(Output(),)), VBox(children=(Output(),)), VBox(children=(Output(),)), VBox(childre…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
     }
    ],
    "source": [
-    "evaluator.evaluate(target_col='trans_type')"
+    "evaluator.evaluate(target_col='trans_type', notebook=True)"
    ]
   },
   {
@@ -437,7 +435,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.8"
+   "version": "3.7.3"
   }
  },
  "nbformat": 4,