Fix binary metrics so they properly return NaN when appropriate #113

ericphanson · 2024-12-12T13:40:22Z

fixes #99

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

kendal-s

I left some questions: just wanted to confirm a few things before merge!

kendal-s · 2024-12-12T16:36:04Z

src/utilities.jl

+    non_nan = (!).(isnan.(x) .| isnan.(y))
+    x = x[non_nan]
+    y = y[non_nan]
+    isempty(x) && return NaN


do we need to check if y is empty as well?

Nvm, I was confusing myself initially but I think the single check covers it! If x is empty y must be empty as well, so the single check is sufficient.

kendal-s · 2024-12-13T02:13:54Z

src/metrics.jl

-    true_positive_rate = (true_positives == 0 && actual_positives == 0) ?
-                         (one(true_positives) / one(actual_positives)) :
-                         (true_positives / actual_positives)
-    true_negative_rate = (true_negatives == 0 && actual_negatives == 0) ?
-                         (one(true_negatives) / one(actual_negatives)) :
-                         (true_negatives / actual_negatives)
-    false_positive_rate = (false_positives == 0 && actual_negatives == 0) ?
-                          (zero(false_positives) / one(actual_negatives)) :
-                          (false_positives / actual_negatives)
-    false_negative_rate = (false_negatives == 0 && actual_positives == 0) ?
-                          (zero(false_negatives) / one(actual_positives)) :
-                          (false_negatives / actual_positives)


Do you remember there a rationale behind these quantities being 1/0 in the edge case? I agree these should return NaN but just curious

I traced this code back to https://github.com/beacon-biosignals/OldLighthouse.jl/pull/36 (private repo), so I believe it was to support ROC curves better, where NaNs could contaminate the whole AUC even if it's just some 0.0 threshold or something. But IMO that is much better handled in the AUC computation itself rather than here.

kendal-s · 2024-12-13T02:14:00Z

src/utilities.jl

Do you know the rationale behind lighthouse returning missing instead of NaN? I'm worried that there may be code that depends on this on this behavior rather than returning NaN.

No, and I agree this is a dangerous change. However, the most common usages of area_under_curve are

Lighthouse.jl/src/metrics.jl

Line 222 in 2a4494e

return TradeoffMetricsV1(; class_index, class_labels, roc_curve,

which does not support missing:

Lighthouse.jl/src/row.jl

Line 302 in 2a4494e

roc_auc::Float64

So I think there is a bug somewhere, either TradeoffMetricsV1 needs to support missing or area_under_curve needs to not return missing. I believe this change will be less disruptive. It does contravene the docstring for area_under_curve so I believe it is technically a breaking change. We could do a breaking version bump here but that will cause a lot of compat updating work and I don't think this will break any users in practice, and it is essentially a bugfix, so I think it's OK. But it is a bit dicey.

kendal-s

I took another look at this and answered one of my own questions, so I'll approve! Would still like info on the other questions just for context if you have it, but they're less important for merge

ericphanson added 2 commits December 12, 2024 14:37

return NaNs correctly

9c43dff

more tests

453aea7

ericphanson requested a review from kendal-s December 12, 2024 13:40

ericphanson and others added 4 commits December 12, 2024 14:42

Update test/metrics.jl

b0118e1

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

excludes NaNs from AUC

7c811ed

return NaN to conform to schema

5853125

tests

7cfce07

ericphanson changed the title ~~Fx binary metrics so they properly return NaN when appropriate~~ Fix binary metrics so they properly return NaN when appropriate Dec 12, 2024

kendal-s reviewed Dec 13, 2024

View reviewed changes

kendal-s approved these changes Dec 13, 2024

View reviewed changes

ericphanson merged commit 710925a into main Dec 13, 2024
10 checks passed

ericphanson deleted the eph/fix-binary-metrics branch December 13, 2024 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix binary metrics so they properly return NaN when appropriate #113

Fix binary metrics so they properly return NaN when appropriate #113

ericphanson commented Dec 12, 2024

kendal-s left a comment

kendal-s Dec 12, 2024

kendal-s Dec 13, 2024

kendal-s Dec 13, 2024

ericphanson Dec 13, 2024

kendal-s Dec 13, 2024

ericphanson Dec 13, 2024 •

edited

Loading

kendal-s left a comment

Fix binary metrics so they properly return NaN when appropriate #113

Fix binary metrics so they properly return NaN when appropriate #113

Conversation

ericphanson commented Dec 12, 2024

kendal-s left a comment

Choose a reason for hiding this comment

kendal-s Dec 12, 2024

Choose a reason for hiding this comment

kendal-s Dec 13, 2024

Choose a reason for hiding this comment

kendal-s Dec 13, 2024

Choose a reason for hiding this comment

ericphanson Dec 13, 2024

Choose a reason for hiding this comment

kendal-s Dec 13, 2024

Choose a reason for hiding this comment

ericphanson Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

kendal-s left a comment

Choose a reason for hiding this comment

ericphanson Dec 13, 2024 •

edited

Loading