Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add analytics table for binary model performance analysis across multiple scores/targets #110

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

MahmoodEtedadi
Copy link
Contributor

@MahmoodEtedadi MahmoodEtedadi commented Nov 21, 2024

Overview

Add model comparison tools to compare model performance across multiple binary model scores/targets.
Closes #62

Description of changes

We create an analytics table using great_tables package to present model performance data across multiple model scores/targets centered around specified values for a performance metric ('Flagged', 'Sensitivity', 'Specificity', 'Threshold'). Users can also choose two values for the specified metric to be considered, which columns should be displayed and if the columns should grouped by scores or targets.

image

For instance, if Sensitivity of [0.7,0.8] are specified, the performance metrics ('PPV', 'Flagged', 'Specificity', 'Threshold', etc.) across model scores/targets for Sensitivity=0.7 and Sensitivity =0.8 are provided.

image

Author Checklist

  • Linting passes; run early with pre-commit hook.
  • Tests added for new code and issue being fixed.
  • Added type annotations and full numpy-style docstrings for new methods.
  • Draft your news fragment in new changelog/ISSUE.TYPE.rst files; see changelog/README.md.

@MahmoodEtedadi MahmoodEtedadi changed the title Add analytics table for model performance analysis across multiple scores/targets Add analytics table for binary model performance analysis across multiple scores/targets Nov 27, 2024
src/seismometer/plot/mpl/color_manipulation.py Outdated Show resolved Hide resolved
src/seismometer/data/metric_to_threshold.py Outdated Show resolved Hide resolved
src/seismometer/data/metric_to_threshold.py Outdated Show resolved Hide resolved
src/seismometer/data/metric_to_threshold.py Outdated Show resolved Hide resolved
src/seismometer/table/analytics_table_config.py Outdated Show resolved Hide resolved
src/seismometer/table/analytics_table.py Outdated Show resolved Hide resolved
src/seismometer/table/analytics_table.py Outdated Show resolved Hide resolved
src/seismometer/table/analytics_table.py Outdated Show resolved Hide resolved
src/seismometer/table/analytics_table.py Outdated Show resolved Hide resolved
src/seismometer/table/analytics_table.py Outdated Show resolved Hide resolved
src/seismometer/plot/mpl/color_manipulation.py Outdated Show resolved Hide resolved
src/seismometer/plot/mpl/color_manipulation.py Outdated Show resolved Hide resolved
src/seismometer/plot/mpl/color_manipulation.py Outdated Show resolved Hide resolved
src/seismometer/data/metric_to_threshold.py Outdated Show resolved Hide resolved
src/seismometer/data/metric_to_threshold.py Outdated Show resolved Hide resolved
src/seismometer/data/metric_to_threshold.py Outdated Show resolved Hide resolved
tests/plot/test_color_manipulations.py Outdated Show resolved Hide resolved
tests/table/test_analytics_table.py Outdated Show resolved Hide resolved
tests/table/test_analytics_table.py Outdated Show resolved Hide resolved
@gbowlin
Copy link
Contributor

gbowlin commented Dec 6, 2024

Mostly a UX thing -

image

We now have two tables, and the styling (indent, some edges) look a bit different, could you bring the two tables more in alignment with each other?

Comment on lines +542 to +549
self._metric_values_slider = FloatRangeSlider(
min=0.01,
max=1.00,
step=0.01,
value=metric_values or [0.2, 0.8],
description="Metric Values",
style=WIDE_LABEL_STYLE,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crashes if the ranges are set to be the same value, should only display one column if they are the same

image


self.decimals = table_config.decimals
self.metric = metric
self.metric_values = metric_values
Copy link
Contributor

@gbowlin gbowlin Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.metric_values = metric_values
self.metric_values = list(set(metric_values))

This fixes the issue where the range slider matches itself - see https://github.com/epic-open-source/seismometer/pull/110/files#r1904553861

Comment on lines +123 to +137
# If polars package is not installed, overwrite is_na function in great_tables package to treat Agnostic
# as pandas dataframe.
try:
import polars as pl

# Use 'pl' to avoid the F401 error
_ = pl.DataFrame()
except ImportError:
from typing import Any

from great_tables._tbl_data import Agnostic, PdDataFrame, is_na

@is_na.register(Agnostic)
def _(df: PdDataFrame, x: Any) -> bool:
return pd.isna(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this outside of this module as its fiddly and probably needed by both this class and others.
Also this will get called multiple times in the notebook, rather than us only needing it once right?

Comment on lines +246 to +255
gt = (
gt.tab_stub(rowname_col=self._get_second_level[self.top_level], groupname_col=self.top_level)
.fmt_number(
columns=[
col
for col in data.columns
if is_numeric_dtype(data[col].dtype) and not is_integer_dtype(data[col].dtype)
],
decimals=self.decimals,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
You should make sure the sublevel names don't include _Value

Copy link
Contributor

@gbowlin gbowlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor visual things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add model comparison tools: compare two model scores.
3 participants