-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group metrics by labels #245
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty clean to me - all comments are nits. My biggest concern with this change is that I don't want to overcomplicate the case where a user doesn't care about class specific results and just wants to aggregate results. As such, I wonder if it would be beneficial to change the interface of the aggregate_results
method to return a scalar and to have all metric classes implement it?
value = f1_score(gt, predicted, average=self.f1_method) | ||
return ScalarResult(value) | ||
results = {} | ||
results["macro"] = f1_score(gt, predicted, average="macro") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:nit: avoid hardcoded strings and instead make them constants
@@ -80,7 +82,7 @@ def eval( | |||
|
|||
def __init__( | |||
self, | |||
enforce_label_match: bool = False, | |||
enforce_label_match: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:nit: please adjust comment below
ScalarResult(109.0 / 300, 3), | ||
{"enforce_label_match": False}, | ||
), | ||
# ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented code before merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course 🙂
Yeah, that is a valid concern. It definitely warrants further concern to take a look at the interface of how and where we choose to group_by and aggregate. I'll try to figure out a suggestion today. |
# TODO(gunnar): Enforce label match -> Why is that a parameter? Should we generally allow IOU matches | ||
# between different labels?!? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we should have an option to allow this. E.g. you need to compute matches across the classes for the confusion matrix.
Context:
We want to support class-specific results from our metrics. As a part of that we're essentially opening up the interface from the metrics to return multiple values with a label per value. For our standard functions we will return a value per label for the polygon metrics.
Before:
We'd get a single number per
DatasetItem
and then a singleaggregate_score
per evaluation.After:
We get a {"key_1": number, ... , "key_N": number} per
DatasetItem
and then another {"another_key_1": number, ..., "another_key_N": number } from aggregate_score.As a part of this we also add
extra_info
which is astring -> string
dictionary which you can add to each dataset item in the metric anderror
field which we use if evaluation of a singleDatasetItem
fails. That way we don't fail the whole evaluation.This PR
This changes the metrics to allow returning multiple values per metrics and changes the default metrics to return results grouped by label.
This also adds an extra method on results that allow us to pass more data than floats to the frontend via
extra_info
, example is that we send the weight of each dataset item with theScalarResult
right now.