Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group metrics by labels #245

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Group metrics by labels #245

wants to merge 6 commits into from

Conversation

gatli
Copy link
Contributor

@gatli gatli commented Mar 4, 2022

Context:

We want to support class-specific results from our metrics. As a part of that we're essentially opening up the interface from the metrics to return multiple values with a label per value. For our standard functions we will return a value per label for the polygon metrics.

Before:

We'd get a single number per DatasetItem and then a single aggregate_score per evaluation.

After:

We get a {"key_1": number, ... , "key_N": number} per DatasetItem and then another {"another_key_1": number, ..., "another_key_N": number } from aggregate_score.

As a part of this we also add extra_info which is a string -> string dictionary which you can add to each dataset item in the metric and error field which we use if evaluation of a single DatasetItem fails. That way we don't fail the whole evaluation.

This PR

This changes the metrics to allow returning multiple values per metrics and changes the default metrics to return results grouped by label.

This also adds an extra method on results that allow us to pass more data than floats to the frontend via extra_info, example is that we send the weight of each dataset item with the ScalarResult right now.

@gatli gatli self-assigned this Mar 4, 2022
@gatli gatli requested a review from pfmark March 11, 2022 13:32
@sasha-scale sasha-scale self-requested a review March 11, 2022 16:47
Copy link
Contributor

@sasha-scale sasha-scale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks pretty clean to me - all comments are nits. My biggest concern with this change is that I don't want to overcomplicate the case where a user doesn't care about class specific results and just wants to aggregate results. As such, I wonder if it would be beneficial to change the interface of the aggregate_results method to return a scalar and to have all metric classes implement it?

value = f1_score(gt, predicted, average=self.f1_method)
return ScalarResult(value)
results = {}
results["macro"] = f1_score(gt, predicted, average="macro")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:nit: avoid hardcoded strings and instead make them constants

@@ -80,7 +82,7 @@ def eval(

def __init__(
self,
enforce_label_match: bool = False,
enforce_label_match: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:nit: please adjust comment below

ScalarResult(109.0 / 300, 3),
{"enforce_label_match": False},
),
# (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code before merging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course 🙂

@gatli
Copy link
Contributor Author

gatli commented Mar 14, 2022

Overall looks pretty clean to me - all comments are nits. My biggest concern with this change is that I don't want to overcomplicate the case where a user doesn't care about class specific results and just wants to aggregate results. As such, I wonder if it would be beneficial to change the interface of the aggregate_results method to return a scalar and to have all metric classes implement it?

Yeah, that is a valid concern. It definitely warrants further concern to take a look at the interface of how and where we choose to group_by and aggregate. I'll try to figure out a suggestion today.

Comment on lines +107 to +108
# TODO(gunnar): Enforce label match -> Why is that a parameter? Should we generally allow IOU matches
# between different labels?!?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we should have an option to allow this. E.g. you need to compute matches across the classes for the confusion matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants