-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Validate] add warning for unspecified arguments, and change default values #277
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests. One question about the division below.
Also, is it worth updating the package version for this?
I tried running this (not sure if I did it right), but the sfn gave a "exc_message":["float division by zero"]
error 🤔
I ran it on this: http://localhost:3003/nucleus/ut_c8qwyvbf1jkhegj18ajg?spoof=sasha.harrison%40scale.com
Update: I did not encounter errors when running on here: http://localhost:3003/nucleus/ut_c8x5dgybw8rg0ehnwx3g?spoof=sasha.harrison%40scale.com
nucleus/metrics/cuboid_metrics.py
Outdated
if self.iou_2d: | ||
weight = len(iou_2d_metric) | ||
avg_iou = iou_2d_metric.sum() / max(weight, sys.float_info.epsilon) | ||
else: | ||
weight = len(iou_3d_metric) | ||
avg_iou = iou_3d_metric.sum() / max(weight, sys.float_info.epsilon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the weight
is zero by some chance (is that even possible?), won't that make avg_iou
absurdly large since we would dividing by sys.float_info.epsilon
?
Perhaps if weight == 0
then avg_iou
should be 0 too?
Not sure what is correct here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch - I agree with you the boundary condition is incorrect here, the iou should be zero with weight zero in that case
I will also add unit test coverage for this case
@@ -28,7 +32,7 @@ class CuboidMetric(Metric): | |||
def __init__( | |||
self, | |||
enforce_label_match: bool = False, | |||
confidence_threshold: float = 0.0, | |||
confidence_threshold: Optional[float] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't it make sense to also add the IOU threshold to the base class? How would a metric make sense without computing the IOU for the matches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed that the iou_threshold wasn't used in the base class, I can add
nucleus/metrics/cuboid_utils.py
Outdated
] | ||
processed_item = {} | ||
processed_item["xyz"] = np.array( | ||
[[ann.position.x, ann.position.y, ann.position.z] for ann in dataitem] | ||
[[ann.position.x, ann.position.y, ann.position.z] for ann in item_list] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we actually have a dict
here for the processed item? Wouldn't a dataclass
or namedtuple
. I'm not a huge fan of dicts interface wise.
tests/metrics/test_cuboid_metrics.py
Outdated
label=CAR_LABEL, | ||
position=Point3D(250, 250, 250), | ||
dimensions=Point3D(2, 2, 2), | ||
yaw=0.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to complete the test set let's also add another prediction/annotation combination with same size but yaw angle difference (e.g. 45 deg) as otherwise all the tests are with axis-aligned bounding boxes only taking into account translation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to implement these! They definitely should have been a part of the initial merge... I guess that one is on me 🙉
One thing that has to be addressed is changing the defaults in the nucleus/validate/eval_functions/available_eval_functions.py
*Config
. In the future I'm hoping we can introduce tests that make sure that the defaults match for the configs and the implementations. Maybe we should try and merge the abstractions since it isn't exactly DRY.
A couple of thoughts on the unit testing style, no need to address them as of now but something to keep in mind when writing new ones.
@@ -10,6 +11,9 @@ | |||
from .filtering import ListOfAndFilters, ListOfOrAndFilters | |||
from .filters import confidence_filter | |||
|
|||
DEFAULT_IOU_THRESHOLD = 0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing the default from 0.0
to 0.1
? I'm not against it would just be nice to have a comment explaining the choice of the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default of 0.0 is really confusing in my opinion, in particular because it makes predictions with zero overlap true positives. 0.1 seemed reasonable for 3D IoU, though I wouldn't be opposed to 0.5 (to match the 2D default threshold) - what do you think?
iou_threshold: Optional[float] = None, | ||
confidence_threshold: Optional[float] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to default params needs to be reflected in nucleus/validate/eval_functions/available_eval_functions.py
configurations as well since that is the client facing code. The defaults there override the defaults provided here if not reflected.
nucleus/metrics/cuboid_utils.py
Outdated
@@ -101,18 +101,25 @@ def wrapper( | |||
return wrapper | |||
|
|||
|
|||
def process_dataitem(dataitem): | |||
def process_dataitem(item_list, confidence_threshold=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty generic name that could be more concrete. If you have the context it would be nice to re-name or add a docstring to explain what this "processing" is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have much context but I agree with you this is a terrible name... I will try to make something better
tests/metrics/test_cuboid_metrics.py
Outdated
dimensions=Point3D(10, 10, 10), | ||
yaw=0.0, | ||
reference_id="item_A", | ||
), # false negative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is very easy to overlook (but super helpful!). I'd do this in the future with an actual variable.
I'd essentially split annotations and predictions into:
expected_matched_anns = [...]
expected_unmatched_anns = [...]
annotations = expected_matched_anns + expected_unmatched_anns
matching_predictions = [...]
false_positive_predictions = [...]
predictions = matching_predictions + false_positive_predictions
This matches how I think most people read code, we're more observant of variable names than comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense! agreed that readability is important here - I appreciate the detailed suggestions please keep them coming 😁
tests/metrics/test_cuboid_metrics.py
Outdated
|
||
def test_cuboid_metrics_multi_item(): | ||
# single item, perfect precision | ||
annotations = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as other comment, I'd split this collection into named sub-collections. Otherwise this just become a wall of annotations and you can't see the bricks. 🧱
tests/metrics/test_cuboid_metrics.py
Outdated
assert CuboidIOU().eval(annotations, predictions) == ScalarResult( | ||
1.0, len(annotations) | ||
), "Unexpected Cuboid IoU result" | ||
assert CuboidPrecision().eval(annotations, predictions) == ScalarResult( | ||
1.0, len(annotations) | ||
), "Unexpected Cuboid Precision result" | ||
assert CuboidRecall().eval(annotations, predictions) == ScalarResult( | ||
1.0, len(annotations) | ||
), "Unexpected Cuboid Recall result" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the calling pattern in the celery service, the filtering is implemented in the base class so this skirts that and introduces a possibility for a mismatch in expectations and behaviour.
I've actually introduced a bug myself by just testing the eval
but the base-implemented functionality broke my expectations.
assert CuboidIOU().eval(annotations, predictions) == ScalarResult( | |
1.0, len(annotations) | |
), "Unexpected Cuboid IoU result" | |
assert CuboidPrecision().eval(annotations, predictions) == ScalarResult( | |
1.0, len(annotations) | |
), "Unexpected Cuboid Precision result" | |
assert CuboidRecall().eval(annotations, predictions) == ScalarResult( | |
1.0, len(annotations) | |
), "Unexpected Cuboid Recall result" | |
assert CuboidIOU()(annotations, predictions) == ScalarResult( | |
1.0, len(annotations) | |
), "Unexpected Cuboid IoU result" | |
assert CuboidPrecision()(annotations, predictions) == ScalarResult( | |
1.0, len(annotations) | |
), "Unexpected Cuboid Precision result" | |
assert CuboidRecall()(annotations, predictions) == ScalarResult( | |
1.0, len(annotations) | |
), "Unexpected Cuboid Recall result" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, I just realized that I had overlooked the label matching when I started this PR. I will keep the original implementation (although it is kind of hard to follow for other developers)
tests/metrics/test_cuboid_metrics.py
Outdated
cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions) | ||
cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions) | ||
cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions) | |
cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions) | |
cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions) | |
cuboid_iou_result1 = CuboidIOU()(annotations, predictions) | |
cuboid_precision_result1 = CuboidPrecision()(annotations, predictions) | |
cuboid_recall_result1 = CuboidRecall()(annotations, predictions) |
tests/metrics/test_cuboid_metrics.py
Outdated
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval( | ||
annotations, predictions | ||
) | ||
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval( | ||
annotations, predictions | ||
) | ||
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval( | ||
annotations, predictions | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval( | |
annotations, predictions | |
) | |
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval( | |
annotations, predictions | |
) | |
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval( | |
annotations, predictions | |
) | |
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False)( | |
annotations, predictions | |
) | |
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False)( | |
annotations, predictions | |
) | |
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False)( | |
annotations, predictions | |
) |
tests/metrics/test_cuboid_metrics.py
Outdated
cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions) | ||
cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions) | ||
cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuboid_iou_result1 = CuboidIOU().eval(annotations, predictions) | |
cuboid_precision_result1 = CuboidPrecision().eval(annotations, predictions) | |
cuboid_recall_result1 = CuboidRecall().eval(annotations, predictions) | |
cuboid_iou_result1 = CuboidIOU()(annotations, predictions) | |
cuboid_precision_result1 = CuboidPrecision()(annotations, predictions) | |
cuboid_recall_result1 = CuboidRecall()(annotations, predictions) |
tests/metrics/test_cuboid_metrics.py
Outdated
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval( | ||
annotations, predictions | ||
) | ||
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval( | ||
annotations, predictions | ||
) | ||
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval( | ||
annotations, predictions | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False).eval( | |
annotations, predictions | |
) | |
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False).eval( | |
annotations, predictions | |
) | |
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False).eval( | |
annotations, predictions | |
) | |
cuboid_iou_result2 = CuboidIOU(enforce_label_match=False)( | |
annotations, predictions | |
) | |
cuboid_precision_result2 = CuboidPrecision(enforce_label_match=False)( | |
annotations, predictions | |
) | |
cuboid_recall_result2 = CuboidRecall(enforce_label_match=False)( | |
annotations, predictions | |
) |
Fixes some problems with 3D evaluation metrics and adds unit test coverage on the numerical correctness of metric functions.
Specific issues found: