-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support NMS op lowering #3871
Support NMS op lowering #3871
Conversation
c0602c7
to
d2869ce
Compare
d2869ce
to
fe96416
Compare
Like we were talking about on a call, I definitely think we should tensorize some of this. In the outer loop, we pick a single box to compare to the other boxes. // outside all loops:
Value x1Slice /*=slice boxes to get x1 values for each box*/;
Value x2Slice ..;
Value y1Slice ..;
Value y2Slice ..;
Value xDistance = x2Slice - x1Slice;
Value yDistance = y2Slice - y1Slice;
Value boxAreas = xDistance * yDistance;
// inside loop over sorted boxes:
Value currBox /*=the box with the highest score among available boxes*/;
// The elementwise tensor arithmetic below allows broadcasting
Value innerX1 = max(x1Slice, currBox[0]);
Value innerX2 = min(x2Slice, currBox[2]);
Value innerY1 ..;
Value innerY2 ..;
Value intersectionDistanceX = innerX2 - innerX1;
Value intersectionDistanceY = innerY2 - innerY1;
Value intersectionArea = intersectionDistanceX * intersectionDistanceY;
Value currArea = boxAreas[currBoxIdx];
Value unionArea = boxAreas + currArea - intersectionArea;
Value IOU = intersectionArea / unionArea; Actually, based on the ordering of
Although, I'm not sure how we can go about skipping the IOU calculations that are redundant. I'm actually curious if it is worth it to skip redundant computations if it requires us to extract individual elements and do the arithmetic one at a time. At the very least, computing each of the box areas outside the loops is going to be an improvement. |
f2997d3
to
e07ec39
Compare
bbbfe87
to
e1c19ec
Compare
Three e2e tests pass with #3892 and without flag
|
e1c19ec
to
e0be6e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me, and if we can pass e2e tests, I think this is good.
My only lingering concern is related to the signature of the onnx op when one of the other optional args is missing. It might be good to add some test cases to check what happens there and make sure we won't mis-compile those examples.
Another point of concern is that this whole implementation could likely be done in the onnx-to-torch lowering and might allow support for the batched version of NMS that onnx uses (we have some unsqueezes in this conversion that cannot reasonably be assumed to succeed). I don't think we need to make that change now, but it's something to keep in mind if we end up seeing this op in models we care about with non-trivial batching.
e0be6e4
to
749f21c
Compare
TODO: support multiple batches and classes