Add analytics table for binary model performance analysis across multiple scores/targets #110

MahmoodEtedadi · 2024-11-21T17:08:34Z

Overview

Add model comparison tools to compare model performance across multiple binary model scores/targets.
Closes #62

Description of changes

We create an analytics table using great_tables package to present model performance data across multiple model scores/targets centered around specified values for a performance metric ('Flagged', 'Sensitivity', 'Specificity', 'Threshold'). Users can also choose two values for the specified metric to be considered, which columns should be displayed and if the columns should grouped by scores or targets.

For instance, if Sensitivity of [0.7,0.8] are specified, the performance metrics ('PPV', 'Flagged', 'Specificity', 'Threshold', etc.) across model scores/targets for Sensitivity=0.7 and Sensitivity =0.8 are provided.

Author Checklist

Linting passes; run early with pre-commit hook.
Tests added for new code and issue being fixed.
Added type annotations and full numpy-style docstrings for new methods.
Draft your news fragment in new changelog/ISSUE.TYPE.rst files; see changelog/README.md.

src/seismometer/plot/mpl/color_manipulation.py

src/seismometer/data/metric_to_threshold.py

src/seismometer/table/analytics_table_config.py

src/seismometer/table/analytics_table.py

src/seismometer/plot/mpl/color_manipulation.py

src/seismometer/data/metric_to_threshold.py

tests/plot/test_color_manipulations.py

tests/table/test_analytics_table.py

src/seismometer/data/binary_performance.py

gbowlin · 2024-12-06T22:37:42Z

Mostly a UX thing -

We now have two tables, and the styling (indent, some edges) look a bit different, could you bring the two tables more in alignment with each other?

src/seismometer/table/analytics_table.py

gbowlin · 2025-01-06T19:31:10Z

src/seismometer/table/analytics_table.py

+        self._metric_values_slider = FloatRangeSlider(
+            min=0.01,
+            max=1.00,
+            step=0.01,
+            value=metric_values or [0.2, 0.8],
+            description="Metric Values",
+            style=WIDE_LABEL_STYLE,
+        )


Crashes if the ranges are set to be the same value, should only display one column if they are the same

gbowlin · 2025-01-06T22:38:41Z

src/seismometer/table/analytics_table.py

+
+        self.decimals = table_config.decimals
+        self.metric = metric
+        self.metric_values = metric_values


Suggested change

self.metric_values = metric_values

self.metric_values = list(set(metric_values))

This fixes the issue where the range slider matches itself - see https://github.com/epic-open-source/seismometer/pull/110/files#r1904553861

gbowlin · 2025-01-06T22:40:47Z

src/seismometer/table/analytics_table.py

+        # If polars package is not installed, overwrite is_na function in great_tables package to treat Agnostic
+        # as pandas dataframe.
+        try:
+            import polars as pl
+
+            # Use 'pl' to avoid the F401 error
+            _ = pl.DataFrame()
+        except ImportError:
+            from typing import Any
+
+            from great_tables._tbl_data import Agnostic, PdDataFrame, is_na
+
+            @is_na.register(Agnostic)
+            def _(df: PdDataFrame, x: Any) -> bool:
+                return pd.isna(x)


Can we put this outside of this module as its fiddly and probably needed by both this class and others.
Also this will get called multiple times in the notebook, rather than us only needing it once right?

gbowlin · 2025-01-06T22:59:22Z

src/seismometer/table/analytics_table.py

+        gt = (
+            gt.tab_stub(rowname_col=self._get_second_level[self.top_level], groupname_col=self.top_level)
+            .fmt_number(
+                columns=[
+                    col
+                    for col in data.columns
+                    if is_numeric_dtype(data[col].dtype) and not is_integer_dtype(data[col].dtype)
+                ],
+                decimals=self.decimals,
+            )


You should make sure the sublevel names don't include _Value

gbowlin

some minor visual things

MahmoodEtedadi requested review from gbowlin and diehlbw November 27, 2024 15:25

MahmoodEtedadi changed the title ~~Add analytics table for model performance analysis across multiple scores/targets~~ Add analytics table for binary model performance analysis across multiple scores/targets Nov 27, 2024

gbowlin requested changes Nov 27, 2024

View reviewed changes

andli28 suggested changes Nov 29, 2024

View reviewed changes

andli28 approved these changes Dec 3, 2024

View reviewed changes

MahmoodEtedadi added 13 commits December 5, 2024 22:09

initial commit

f56c1e4

analytics table + pre-cmmit changes

935676f

adding changelog + changes to the structure

f5c48ef

add some unit tests

0514f8f

Add ExploreAnalyticsTable

772f24f

fix unit tests

2f33915

add some unit tests

8782c68

minor unit test fix

3425fc4

remove coloring option from analytics_table + correcting zebra striping

8be0954

Remove second style of color_Bar + PQA comments + Update notebook

9822843

add per_context to analytics table + remove zebra striping

196cb87

changes according to reviewer comments

29cd288

update unit tests

d40d9fd

MahmoodEtedadi force-pushed the add-analytics-table branch from 4c1305c to d40d9fd Compare December 5, 2024 22:23

MahmoodEtedadi added 2 commits December 5, 2024 22:38

fix unit tests

0415371

update analytics table api

71f435b

gbowlin reviewed Dec 6, 2024

View reviewed changes

src/seismometer/data/binary_performance.py Outdated Show resolved Hide resolved

gbowlin reviewed Dec 6, 2024

View reviewed changes

src/seismometer/table/analytics_table.py Outdated Show resolved Hide resolved

gbowlin reviewed Dec 6, 2024

View reviewed changes

src/seismometer/table/analytics_table.py Outdated Show resolved Hide resolved

gbowlin reviewed Dec 6, 2024

View reviewed changes

src/seismometer/table/analytics_table.py Show resolved Hide resolved

MahmoodEtedadi added 3 commits December 9, 2024 22:50

fix table alignments

a01b9a9

add group scores checkbox

a97112f

move fairness to table subpackage + flagrate instead of flagged

bea10ed

remove df as a param from

b951efa

MahmoodEtedadi requested a review from gbowlin December 11, 2024 23:07

MahmoodEtedadi added 4 commits December 11, 2024 23:21

treat threshold as int + change to AnalyticsTable class

17a1c1c

move calculations out of analytics_table.py + other minor changes

0afe4b9

Add more unit tests

e5f3af5

update user guide

abfec5f

gbowlin reviewed Jan 6, 2025

View reviewed changes

MahmoodEtedadi added 2 commits January 6, 2025 20:20

update docstrings

6adb562

fix unit tests

9d661eb

gbowlin reviewed Jan 6, 2025

View reviewed changes

gbowlin requested changes Jan 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add analytics table for binary model performance analysis across multiple scores/targets #110

Add analytics table for binary model performance analysis across multiple scores/targets #110

MahmoodEtedadi commented Nov 21, 2024 •

edited

Loading

gbowlin commented Dec 6, 2024

gbowlin Jan 6, 2025

gbowlin Jan 6, 2025 •

edited

Loading

gbowlin Jan 6, 2025

gbowlin Jan 6, 2025

gbowlin left a comment

	self.metric_values = metric_values
	self.metric_values = list(set(metric_values))

Add analytics table for binary model performance analysis across multiple scores/targets #110

Are you sure you want to change the base?

Add analytics table for binary model performance analysis across multiple scores/targets #110

Conversation

MahmoodEtedadi commented Nov 21, 2024 • edited Loading

Overview

Description of changes

Author Checklist

gbowlin commented Dec 6, 2024

gbowlin Jan 6, 2025

Choose a reason for hiding this comment

gbowlin Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

gbowlin Jan 6, 2025

Choose a reason for hiding this comment

gbowlin Jan 6, 2025

Choose a reason for hiding this comment

gbowlin left a comment

Choose a reason for hiding this comment

MahmoodEtedadi commented Nov 21, 2024 •

edited

Loading

gbowlin Jan 6, 2025 •

edited

Loading