-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor nms
into TorchVision variant.
#6814
Conversation
I'm having a bit of trouble identifying the cause of the CI failure. Basically, if we set RuntimeError: Bad StatusOr access: INVALID_ARGUMENT: Executable expected parameter 0 of size 4000
but got buffer with incompatible size 4004 After looking into where this error was coming from, I found out that it was due to PyTorch/XLA trying to figure out the actual size of the output. My guess is that, since I call Given all this information, I'm not sure where the "buffer with imcompatible size 4004" is coming from. Another thing I'm puzzled about is why is this test failing only when @JackCaoG Have you ever seen an error like this? I believe other lowerings also call this same function (e.g. |
|
Right. So, what is puzzling is why would it fail only when |
not really but feel free to skip the test when |
a4e90c8
to
be21831
Compare
const xla::Shape& boxes_shape = ShapeHelper::ShapeOfXlaOp(boxes); | ||
XLA_CHECK_EQ(boxes_shape.rank(), 2); | ||
XLA_CHECK_EQ(boxes_shape.dimensions(1), COORDINATES); | ||
int64_t num_boxes = boxes_shape.dimensions(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the implementation here is a copy with some modification of the deleted nms_op.cpp, or if it is a brand new implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that it is a new implementation heavily based on the previous one. So, the problems that I saw with the previous implementation were:
- Almost no comments: couldn't easily tell what some parts of the code were doing
- Copied from another source: it was copied from an old version of tensorflow
- Different signature and semantics: it returned the best
output_size
box indices, even though some of them might have been suppressed
I believe this implementation checks all of the above-mentioned boxes. I added plenty of comments + a few changes that resulted in a more maintainable code (IMHO).
@JackCaoG @vanbasten23 I think this PR is ready. Could you take a look at it when you have some time? |
init_values, "BoxSelectionLoop", builder)); | ||
} | ||
|
||
xla::XlaOp BuildNms(xla::XlaOp boxes, xla::XlaOp scores, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any resource that you referenced when coming up with the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I based this implementation in two other implementations:
- The old PyTorch/XLA
nms
implementation - TorchVision CUDA
nms
implementation
return boxes, scores | ||
|
||
@skipOnEagerDebug | ||
def test_nms_ref(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ysiraichi can you please add a dynamic shape input test scenario for nms
? Because this op is dynamic, it's been falling back to CPU. Of course, there is a lot of interest to bring the op to the xla device; though, this requires correct functionality for dynamism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more clarity: number of boxes is set to 1000
at the moment. we want that number to be dynamic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be clear the output of this test appears to be dynamic, though the input number of boxes can also be dynamic for nms
. This test, currently, covers one dynamism scenario (i.e. the output dynamism).
This PR refactors the existing
nms
lowering implementation, changing the appropriate parts for complying with TorchVision semantics. In summary:nms
lowering, based on the old implementationnms
torchvision::nms
torchvision.ops.nms(...)
directlycc @miladm @JackCaoG