Group metrics by labels #245

gatli · 2022-03-04T13:39:40Z

Context:

We want to support class-specific results from our metrics. As a part of that we're essentially opening up the interface from the metrics to return multiple values with a label per value. For our standard functions we will return a value per label for the polygon metrics.

Before:

We'd get a single number per DatasetItem and then a single aggregate_score per evaluation.

After:

We get a {"key_1": number, ... , "key_N": number} per DatasetItem and then another {"another_key_1": number, ..., "another_key_N": number } from aggregate_score.

As a part of this we also add extra_info which is a string -> string dictionary which you can add to each dataset item in the metric and error field which we use if evaluation of a single DatasetItem fails. That way we don't fail the whole evaluation.

This PR

This changes the metrics to allow returning multiple values per metrics and changes the default metrics to return results grouped by label.

This also adds an extra method on results that allow us to pass more data than floats to the frontend via extra_info, example is that we send the weight of each dataset item with the ScalarResult right now.

sasha-scale

Overall looks pretty clean to me - all comments are nits. My biggest concern with this change is that I don't want to overcomplicate the case where a user doesn't care about class specific results and just wants to aggregate results. As such, I wonder if it would be beneficial to change the interface of the aggregate_results method to return a scalar and to have all metric classes implement it?

sasha-scale · 2022-03-11T16:52:47Z

nucleus/metrics/categorization_metrics.py

-        value = f1_score(gt, predicted, average=self.f1_method)
-        return ScalarResult(value)
+        results = {}
+        results["macro"] = f1_score(gt, predicted, average="macro")


:nit: avoid hardcoded strings and instead make them constants

sasha-scale · 2022-03-11T16:54:53Z

nucleus/metrics/polygon_metrics.py

@@ -80,7 +82,7 @@ def eval(

    def __init__(
        self,
-        enforce_label_match: bool = False,
+        enforce_label_match: bool = True,


:nit: please adjust comment below

sasha-scale · 2022-03-11T16:56:24Z

tests/metrics/test_polygon_metrics.py

-            ScalarResult(109.0 / 300, 3),
-            {"enforce_label_match": False},
-        ),
+        # (


remove commented code before merging

Of course 🙂

gatli · 2022-03-14T11:05:31Z

Overall looks pretty clean to me - all comments are nits. My biggest concern with this change is that I don't want to overcomplicate the case where a user doesn't care about class specific results and just wants to aggregate results. As such, I wonder if it would be beneficial to change the interface of the aggregate_results method to return a scalar and to have all metric classes implement it?

Yeah, that is a valid concern. It definitely warrants further concern to take a look at the interface of how and where we choose to group_by and aggregate. I'll try to figure out a suggestion today.

pfmark · 2022-03-17T19:56:50Z

nucleus/metrics/polygon_metrics.py

+            # TODO(gunnar): Enforce label match -> Why is that a parameter? Should we generally allow IOU matches
+            #  between different labels?!?


In general we should have an option to allow this. E.g. you need to compute matches across the classes for the confusion matrix.

Add Label Grouper

76147df

gatli self-assigned this Mar 4, 2022

gatli added 4 commits March 4, 2022 14:59

All tests except for non matching ones running

6387817

WIP

7e9f48e

Merge remote-tracking branch 'origin/master' into gunnar-group-by-label

4243f6f

Clean up MetricResult interfaces

12886e1

gatli requested a review from pfmark March 11, 2022 13:32

sasha-scale self-requested a review March 11, 2022 16:47

sasha-scale reviewed Mar 11, 2022

View reviewed changes

Cleanup of mypy errors and addressing inconsistencies from PR

860601d

pfmark reviewed Mar 17, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group metrics by labels #245

Group metrics by labels #245

gatli commented Mar 4, 2022 •

edited

Loading

sasha-scale left a comment

sasha-scale Mar 11, 2022

sasha-scale Mar 11, 2022

sasha-scale Mar 11, 2022

gatli Mar 14, 2022

gatli commented Mar 14, 2022

pfmark Mar 17, 2022

		# TODO(gunnar): Enforce label match -> Why is that a parameter? Should we generally allow IOU matches
		# between different labels?!?

Group metrics by labels #245

Are you sure you want to change the base?

Group metrics by labels #245

Conversation

gatli commented Mar 4, 2022 • edited Loading

Context:

Before:

After:

This PR

sasha-scale left a comment

Choose a reason for hiding this comment

sasha-scale Mar 11, 2022

Choose a reason for hiding this comment

sasha-scale Mar 11, 2022

Choose a reason for hiding this comment

sasha-scale Mar 11, 2022

Choose a reason for hiding this comment

gatli Mar 14, 2022

Choose a reason for hiding this comment

gatli commented Mar 14, 2022

pfmark Mar 17, 2022

Choose a reason for hiding this comment

gatli commented Mar 4, 2022 •

edited

Loading