Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Validate] add warning for unspecified arguments, and change default values #277

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sasha-scale
Copy link
Contributor

@sasha-scale sasha-scale commented Apr 19, 2022

Fixes some problems with 3D evaluation metrics and adds unit test coverage on the numerical correctness of metric functions.

Specific issues found:

  • enforcing label matches not properly taken into account by 3D metrics.
  • default IoU threshold of 0.0 causing some hidden unexpected behavior
  • incorrect weight/denominator for iou matches. Should divided by number of valid IoU matches, rather than # of predictions or # of annotations

@sasha-scale sasha-scale changed the title add warning for unspecified arguments, and change default values [Validate] add warning for unspecified arguments, and change default values Apr 19, 2022
@sasha-scale sasha-scale requested a review from a team April 19, 2022 22:58
Copy link
Contributor

@jean-lucas jean-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests. One question about the division below.
Also, is it worth updating the package version for this?

I tried running this (not sure if I did it right), but the sfn gave a "exc_message":["float division by zero"] error 🤔

I ran it on this: http://localhost:3003/nucleus/ut_c8qwyvbf1jkhegj18ajg?spoof=sasha.harrison%40scale.com

Update: I did not encounter errors when running on here: http://localhost:3003/nucleus/ut_c8x5dgybw8rg0ehnwx3g?spoof=sasha.harrison%40scale.com

Comment on lines 168 to 173
if self.iou_2d:
weight = len(iou_2d_metric)
avg_iou = iou_2d_metric.sum() / max(weight, sys.float_info.epsilon)
else:
weight = len(iou_3d_metric)
avg_iou = iou_3d_metric.sum() / max(weight, sys.float_info.epsilon)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the weight is zero by some chance (is that even possible?), won't that make avg_iou absurdly large since we would dividing by sys.float_info.epsilon?

Perhaps if weight == 0 then avg_iou should be 0 too?

Not sure what is correct here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - I agree with you the boundary condition is incorrect here, the iou should be zero with weight zero in that case

I will also add unit test coverage for this case

@@ -28,7 +32,7 @@ class CuboidMetric(Metric):
def __init__(
self,
enforce_label_match: bool = False,
confidence_threshold: float = 0.0,
confidence_threshold: Optional[float] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't it make sense to also add the IOU threshold to the base class? How would a metric make sense without computing the IOU for the matches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed that the iou_threshold wasn't used in the base class, I can add

Comment on lines 110 to 113
]
processed_item = {}
processed_item["xyz"] = np.array(
[[ann.position.x, ann.position.y, ann.position.z] for ann in dataitem]
[[ann.position.x, ann.position.y, ann.position.z] for ann in item_list]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we actually have a dict here for the processed item? Wouldn't a dataclass or namedtuple. I'm not a huge fan of dicts interface wise.

Comment on lines 110 to 113
label=CAR_LABEL,
position=Point3D(250, 250, 250),
dimensions=Point3D(2, 2, 2),
yaw=0.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to complete the test set let's also add another prediction/annotation combination with same size but yaw angle difference (e.g. 45 deg) as otherwise all the tests are with axis-aligned bounding boxes only taking into account translation

Copy link
Contributor

@gatli gatli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to implement these! They definitely should have been a part of the initial merge... I guess that one is on me 🙉

One thing that has to be addressed is changing the defaults in the nucleus/validate/eval_functions/available_eval_functions.py *Config. In the future I'm hoping we can introduce tests that make sure that the defaults match for the configs and the implementations. Maybe we should try and merge the abstractions since it isn't exactly DRY.

A couple of thoughts on the unit testing style, no need to address them as of now but something to keep in mind when writing new ones.

@@ -10,6 +11,9 @@
from .filtering import ListOfAndFilters, ListOfOrAndFilters
from .filters import confidence_filter

DEFAULT_IOU_THRESHOLD = 0.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing the default from 0.0 to 0.1? I'm not against it would just be nice to have a comment explaining the choice of the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default of 0.0 is really confusing in my opinion, in particular because it makes predictions with zero overlap true positives. 0.1 seemed reasonable for 3D IoU, though I wouldn't be opposed to 0.5 (to match the 2D default threshold) - what do you think?

Comment on lines +251 to +252
iou_threshold: Optional[float] = None,
confidence_threshold: Optional[float] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to default params needs to be reflected in nucleus/validate/eval_functions/available_eval_functions.py configurations as well since that is the client facing code. The defaults there override the defaults provided here if not reflected.

@@ -101,18 +101,25 @@ def wrapper(
return wrapper


def process_dataitem(dataitem):
def process_dataitem(item_list, confidence_threshold=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty generic name that could be more concrete. If you have the context it would be nice to re-name or add a docstring to explain what this "processing" is doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have much context but I agree with you this is a terrible name... I will try to make something better

dimensions=Point3D(10, 10, 10),
yaw=0.0,
reference_id="item_A",
), # false negative
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is very easy to overlook (but super helpful!). I'd do this in the future with an actual variable.

I'd essentially split annotations and predictions into:

expected_matched_anns  = [...]
expected_unmatched_anns = [...]
annotations = expected_matched_anns + expected_unmatched_anns

matching_predictions = [...]
false_positive_predictions = [...]
predictions = matching_predictions + false_positive_predictions

This matches how I think most people read code, we're more observant of variable names than comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense! agreed that readability is important here - I appreciate the detailed suggestions please keep them coming 😁


def test_cuboid_metrics_multi_item():
# single item, perfect precision
annotations = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as other comment, I'd split this collection into named sub-collections. Otherwise this just become a wall of annotations and you can't see the bricks. 🧱

Comment on lines 58 to 66
assert CuboidIOU().eval(annotations, predictions) == ScalarResult(
1.0, len(annotations)
), "Unexpected Cuboid IoU result"
assert CuboidPrecision().eval(annotations, predictions) == ScalarResult(
1.0, len(annotations)
), "Unexpected Cuboid Precision result"
assert CuboidRecall().eval(annotations, predictions) == ScalarResult(
1.0, len(annotations)
), "Unexpected Cuboid Recall result"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the calling pattern in the celery service, the filtering is implemented in the base class so this skirts that and introduces a possibility for a mismatch in expectations and behaviour.

I've actually introduced a bug myself by just testing the eval but the base-implemented functionality broke my expectations.

Suggested change
assert CuboidIOU().eval(annotations, predictions) == ScalarResult(
1.0, len(annotations)
), "Unexpected Cuboid IoU result"
assert CuboidPrecision().eval(annotations, predictions) == ScalarResult(
1.0, len(annotations)
), "Unexpected Cuboid Precision result"
assert CuboidRecall().eval(annotations, predictions) == ScalarResult(
1.0, len(annotations)
), "Unexpected Cuboid Recall result"
assert CuboidIOU()(annotations, predictions) == ScalarResult(
1.0, len(annotations)
), "Unexpected Cuboid IoU result"
assert CuboidPrecision()(annotations, predictions) == ScalarResult(
1.0, len(annotations)
), "Unexpected Cuboid Precision result"
assert CuboidRecall()(annotations, predictions) == ScalarResult(
1.0, len(annotations)
), "Unexpected Cuboid Recall result"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I just realized that I had overlooked the label matching when I started this PR. I will keep the original implementation (although it is kind of hard to follow for other developers)

Comment on lines 179 to 181
cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions)
cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions)
cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions)
cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions)
cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions)
cuboid_iou_result1 = CuboidIOU()(annotations, predictions)
cuboid_precision_result1 = CuboidPrecision()(annotations, predictions)
cuboid_recall_result1 = CuboidRecall()(annotations, predictions)

Comment on lines 192 to 200
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval(
annotations, predictions
)
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval(
annotations, predictions
)
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval(
annotations, predictions
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval(
annotations, predictions
)
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval(
annotations, predictions
)
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval(
annotations, predictions
)
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False)(
annotations, predictions
)
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False)(
annotations, predictions
)
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False)(
annotations, predictions
)

Comment on lines 286 to 288
cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions)
cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions)
cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions)
cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions)
cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions)
cuboid_iou_result1 = CuboidIOU()(annotations, predictions)
cuboid_precision_result1 = CuboidPrecision()(annotations, predictions)
cuboid_recall_result1 = CuboidRecall()(annotations, predictions)

Comment on lines 299 to 307
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval(
annotations, predictions
)
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval(
annotations, predictions
)
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval(
annotations, predictions
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval(
annotations, predictions
)
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval(
annotations, predictions
)
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval(
annotations, predictions
)
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False)(
annotations, predictions
)
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False)(
annotations, predictions
)
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False)(
annotations, predictions
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants