[Validate] add warning for unspecified arguments, and change default values #277

sasha-scale · 2022-04-19T21:29:23Z

Fixes some problems with 3D evaluation metrics and adds unit test coverage on the numerical correctness of metric functions.

Specific issues found:

enforcing label matches not properly taken into account by 3D metrics.
default IoU threshold of 0.0 causing some hidden unexpected behavior
incorrect weight/denominator for iou matches. Should divided by number of valid IoU matches, rather than # of predictions or # of annotations

jean-lucas

Nice tests. One question about the division below.
Also, is it worth updating the package version for this?

I tried running this (not sure if I did it right), but the sfn gave a "exc_message":["float division by zero"] error 🤔

I ran it on this: http://localhost:3003/nucleus/ut_c8qwyvbf1jkhegj18ajg?spoof=sasha.harrison%40scale.com

Update: I did not encounter errors when running on here: http://localhost:3003/nucleus/ut_c8x5dgybw8rg0ehnwx3g?spoof=sasha.harrison%40scale.com

jean-lucas · 2022-04-19T23:15:06Z

nucleus/metrics/cuboid_metrics.py

        if self.iou_2d:
+            weight = len(iou_2d_metric)
            avg_iou = iou_2d_metric.sum() / max(weight, sys.float_info.epsilon)
        else:
+            weight = len(iou_3d_metric)
            avg_iou = iou_3d_metric.sum() / max(weight, sys.float_info.epsilon)


if the weight is zero by some chance (is that even possible?), won't that make avg_iou absurdly large since we would dividing by sys.float_info.epsilon?

Perhaps if weight == 0 then avg_iou should be 0 too?

Not sure what is correct here.

good catch - I agree with you the boundary condition is incorrect here, the iou should be zero with weight zero in that case

I will also add unit test coverage for this case

pfmark · 2022-04-20T09:31:54Z

nucleus/metrics/cuboid_metrics.py

@@ -28,7 +32,7 @@ class CuboidMetric(Metric):
    def __init__(
        self,
        enforce_label_match: bool = False,
-        confidence_threshold: float = 0.0,
+        confidence_threshold: Optional[float] = None,


doesn't it make sense to also add the IOU threshold to the base class? How would a metric make sense without computing the IOU for the matches?

I also noticed that the iou_threshold wasn't used in the base class, I can add

pfmark · 2022-04-20T09:44:01Z

nucleus/metrics/cuboid_utils.py

+        ]
    processed_item = {}
    processed_item["xyz"] = np.array(
-        [[ann.position.x, ann.position.y, ann.position.z] for ann in dataitem]
+        [[ann.position.x, ann.position.y, ann.position.z] for ann in item_list]


Why do we actually have a dict here for the processed item? Wouldn't a dataclass or namedtuple. I'm not a huge fan of dicts interface wise.

pfmark · 2022-04-20T12:45:17Z

tests/metrics/test_cuboid_metrics.py

+            label=CAR_LABEL,
+            position=Point3D(250, 250, 250),
+            dimensions=Point3D(2, 2, 2),
+            yaw=0.0,


to complete the test set let's also add another prediction/annotation combination with same size but yaw angle difference (e.g. 45 deg) as otherwise all the tests are with axis-aligned bounding boxes only taking into account translation

gatli

Thanks for taking the time to implement these! They definitely should have been a part of the initial merge... I guess that one is on me 🙉

One thing that has to be addressed is changing the defaults in the nucleus/validate/eval_functions/available_eval_functions.py *Config. In the future I'm hoping we can introduce tests that make sure that the defaults match for the configs and the implementations. Maybe we should try and merge the abstractions since it isn't exactly DRY.

A couple of thoughts on the unit testing style, no need to address them as of now but something to keep in mind when writing new ones.

gatli · 2022-04-21T07:08:30Z

nucleus/metrics/cuboid_metrics.py

@@ -10,6 +11,9 @@
 from .filtering import ListOfAndFilters, ListOfOrAndFilters
 from .filters import confidence_filter

+DEFAULT_IOU_THRESHOLD = 0.1


Why are we changing the default from 0.0 to 0.1? I'm not against it would just be nice to have a comment explaining the choice of the default.

the default of 0.0 is really confusing in my opinion, in particular because it makes predictions with zero overlap true positives. 0.1 seemed reasonable for 3D IoU, though I wouldn't be opposed to 0.5 (to match the 2D default threshold) - what do you think?

gatli · 2022-04-21T07:11:06Z

nucleus/metrics/cuboid_metrics.py

+        iou_threshold: Optional[float] = None,
+        confidence_threshold: Optional[float] = None,


Changes to default params needs to be reflected in nucleus/validate/eval_functions/available_eval_functions.py configurations as well since that is the client facing code. The defaults there override the defaults provided here if not reflected.

gatli · 2022-04-21T07:17:09Z

nucleus/metrics/cuboid_utils.py

@@ -101,18 +101,25 @@ def wrapper(
    return wrapper


-def process_dataitem(dataitem):
+def process_dataitem(item_list, confidence_threshold=None):


This is a pretty generic name that could be more concrete. If you have the context it would be nice to re-name or add a docstring to explain what this "processing" is doing.

I don't really have much context but I agree with you this is a terrible name... I will try to make something better

gatli · 2022-04-21T07:23:28Z

tests/metrics/test_cuboid_metrics.py

+            dimensions=Point3D(10, 10, 10),
+            yaw=0.0,
+            reference_id="item_A",
+        ),  # false negative


This comment is very easy to overlook (but super helpful!). I'd do this in the future with an actual variable.

I'd essentially split annotations and predictions into:

expected_matched_anns = [...] expected_unmatched_anns = [...] annotations = expected_matched_anns + expected_unmatched_anns matching_predictions = [...] false_positive_predictions = [...] predictions = matching_predictions + false_positive_predictions

This matches how I think most people read code, we're more observant of variable names than comments.

makes sense! agreed that readability is important here - I appreciate the detailed suggestions please keep them coming 😁

gatli · 2022-04-21T07:24:42Z

tests/metrics/test_cuboid_metrics.py

+
+def test_cuboid_metrics_multi_item():
+    # single item, perfect precision
+    annotations = [


Same as other comment, I'd split this collection into named sub-collections. Otherwise this just become a wall of annotations and you can't see the bricks. 🧱

gatli · 2022-04-21T07:28:47Z

tests/metrics/test_cuboid_metrics.py

+    assert CuboidIOU().eval(annotations, predictions) == ScalarResult(
+        1.0, len(annotations)
+    ), "Unexpected Cuboid IoU result"
+    assert CuboidPrecision().eval(annotations, predictions) == ScalarResult(
+        1.0, len(annotations)
+    ), "Unexpected Cuboid Precision result"
+    assert CuboidRecall().eval(annotations, predictions) == ScalarResult(
+        1.0, len(annotations)
+    ), "Unexpected Cuboid Recall result"


This doesn't match the calling pattern in the celery service, the filtering is implemented in the base class so this skirts that and introduces a possibility for a mismatch in expectations and behaviour.

I've actually introduced a bug myself by just testing the eval but the base-implemented functionality broke my expectations.

Suggested change

assert CuboidIOU().eval(annotations, predictions) == ScalarResult(

1.0, len(annotations)

), "Unexpected Cuboid IoU result"

assert CuboidPrecision().eval(annotations, predictions) == ScalarResult(

1.0, len(annotations)

), "Unexpected Cuboid Precision result"

assert CuboidRecall().eval(annotations, predictions) == ScalarResult(

1.0, len(annotations)

), "Unexpected Cuboid Recall result"

assert CuboidIOU()(annotations, predictions) == ScalarResult(

1.0, len(annotations)

), "Unexpected Cuboid IoU result"

assert CuboidPrecision()(annotations, predictions) == ScalarResult(

1.0, len(annotations)

), "Unexpected Cuboid Precision result"

assert CuboidRecall()(annotations, predictions) == ScalarResult(

1.0, len(annotations)

), "Unexpected Cuboid Recall result"

Thanks for pointing this out, I just realized that I had overlooked the label matching when I started this PR. I will keep the original implementation (although it is kind of hard to follow for other developers)

gatli · 2022-04-21T07:30:09Z

tests/metrics/test_cuboid_metrics.py

+    cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions)
+    cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions)
+    cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions)


Suggested change

cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions)

cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions)

cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions)

cuboid_iou_result1 = CuboidIOU()(annotations, predictions)

cuboid_precision_result1 = CuboidPrecision()(annotations, predictions)

cuboid_recall_result1 = CuboidRecall()(annotations, predictions)

gatli · 2022-04-21T07:30:28Z

tests/metrics/test_cuboid_metrics.py

+    cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval(
+        annotations, predictions
+    )
+    cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval(
+        annotations, predictions
+    )
+    cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval(
+        annotations, predictions
+    )


Suggested change

cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval(

annotations, predictions

)

cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval(

annotations, predictions

)

cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval(

annotations, predictions

)

cuboid_iou_result2 = CuboidIOU(enforce_label_match=False)(

annotations, predictions

)

cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False)(

annotations, predictions

)

cuboid_recall_result2 = CuboidRecall(enforce_label_match=False)(

annotations, predictions

)

gatli · 2022-04-21T07:30:48Z

tests/metrics/test_cuboid_metrics.py

+    cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions)
+    cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions)
+    cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions)


Suggested change

cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions)

cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions)

cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions)

cuboid_iou_result1 = CuboidIOU()(annotations, predictions)

cuboid_precision_result1 = CuboidPrecision()(annotations, predictions)

cuboid_recall_result1 = CuboidRecall()(annotations, predictions)

gatli · 2022-04-21T07:31:06Z

tests/metrics/test_cuboid_metrics.py

+    cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval(
+        annotations, predictions
+    )
+    cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval(
+        annotations, predictions
+    )
+    cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval(
+        annotations, predictions
+    )


Suggested change

cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval(

annotations, predictions

)

cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval(

annotations, predictions

)

cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval(

annotations, predictions

)

cuboid_iou_result2 = CuboidIOU(enforce_label_match=False)(

annotations, predictions

)

cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False)(

annotations, predictions

)

cuboid_recall_result2 = CuboidRecall(enforce_label_match=False)(

annotations, predictions

)

add warning for unspecified arguments, and change default values

2e373ac

sasha-scale changed the title ~~add warning for unspecified arguments, and change default values~~ [Validate] add warning for unspecified arguments, and change default values Apr 19, 2022

add unit test coveragE

5f726e2

sasha-scale requested a review from a team April 19, 2022 22:58

jean-lucas reviewed Apr 19, 2022

View reviewed changes

pfmark reviewed Apr 20, 2022

View reviewed changes

gatli reviewed Apr 21, 2022

View reviewed changes

address PR comments

af6c06e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Validate] add warning for unspecified arguments, and change default values #277

[Validate] add warning for unspecified arguments, and change default values #277

sasha-scale commented Apr 19, 2022 •

edited

Loading

jean-lucas left a comment •

edited

Loading

jean-lucas Apr 19, 2022

sasha-scale Apr 22, 2022

pfmark Apr 20, 2022

sasha-scale Apr 22, 2022

pfmark Apr 20, 2022

pfmark Apr 20, 2022

gatli left a comment

gatli Apr 21, 2022

sasha-scale Apr 22, 2022

gatli Apr 21, 2022

gatli Apr 21, 2022

sasha-scale Apr 22, 2022

gatli Apr 21, 2022

sasha-scale Apr 22, 2022

gatli Apr 21, 2022

gatli Apr 21, 2022

sasha-scale Apr 23, 2022

gatli Apr 21, 2022

gatli Apr 21, 2022

gatli Apr 21, 2022

gatli Apr 21, 2022

		iou_threshold: Optional[float] = None,
		confidence_threshold: Optional[float] = None,

[Validate] add warning for unspecified arguments, and change default values #277

Are you sure you want to change the base?

[Validate] add warning for unspecified arguments, and change default values #277

Conversation

sasha-scale commented Apr 19, 2022 • edited Loading

jean-lucas left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sasha-scale commented Apr 19, 2022 •

edited

Loading

jean-lucas left a comment •

edited

Loading