Refactor `nms` into TorchVision variant. #6814

ysiraichi · 2024-03-23T23:39:35Z

This PR refactors the existing nms lowering implementation, changing the appropriate parts for complying with TorchVision semantics. In summary:

Implement new nms lowering, based on the old implementation
- More comments
- TorchVision semantics
Adapts TorchVision tests for nms
Move the lowering implementation to torch_xla/csrc/xla_lower_util.cpp
- Alongside other lowering implementations
Register the new kernel as an XLA dispatch for torchvision::nms
- Make it possible to call the kernel by calling torchvision.ops.nms(...) directly
Kills the old implementation

cc @miladm @JackCaoG

ysiraichi · 2024-03-24T18:55:27Z

I'm having a bit of trouble identifying the cause of the CI failure. Basically, if we set XLA_USE_EAGER_DEBUG_MODE=1, the test TestNMS::test_nms_ref starts failing with the following error:

RuntimeError: Bad StatusOr access: INVALID_ARGUMENT: Executable expected parameter 0 of size 4000
but got buffer with incompatible size 4004

After looking into where this error was coming from, I found out that it was due to PyTorch/XLA trying to figure out the actual size of the output.

My guess is that, since I call xla::SetDimensionSize in the lowering, one of the dimensions are marked as dynamic. Which makes the XLATensorImpl::sym_sizes_ be populated with a c10::SymInt. When we try to retrieve the actual size (triggered by a to_functional_tensor(t) call on the result of nms), PyTorch/XLA tries to figure it out by actually running the computation.

Given all this information, I'm not sure where the "buffer with imcompatible size 4004" is coming from. test_nms_ref does create a tensor of size 4000 (1000x4 elements). But, the last 4 is unclear to me.

Another thing I'm puzzled about is why is this test failing only when XLA_USE_EAGER_DEBUG_MODE is set? Why wouldn't a normal test run trigger the same error?

@JackCaoG Have you ever seen an error like this? I believe other lowerings also call this same function (e.g. BuildMaskedSelect). So, doesn't seem like a problem with xla::SetDimensionSize being used.

JackCaoG · 2024-03-25T18:47:34Z

XLA_USE_EAGER_DEBUG_MODE it is a debug thing where it tries to execute the value of the node upon construction. You can think of it as calling mark_step every time a IR node is created.

ysiraichi · 2024-03-25T22:54:24Z

Right. So, what is puzzling is why would it fail only when XLA_USE_EAGER_DEBUG_MODE is set. Do you have any guesses?

JackCaoG · 2024-03-25T22:56:19Z

not really but feel free to skip the test when XLA_USE_EAGER_DEBUG_MODE is enabled. nms is a rarely used ops that's not even aten, XLA_USE_EAGER_DEBUG_MODE is a debug only flag. I think it is OK.

vanbasten23 · 2024-03-27T00:11:29Z

torch_xla/csrc/xla_lower_util.cpp

+  const xla::Shape& boxes_shape = ShapeHelper::ShapeOfXlaOp(boxes);
+  XLA_CHECK_EQ(boxes_shape.rank(), 2);
+  XLA_CHECK_EQ(boxes_shape.dimensions(1), COORDINATES);
+  int64_t num_boxes = boxes_shape.dimensions(0);


I wonder if the implementation here is a copy with some modification of the deleted nms_op.cpp, or if it is a brand new implementation.

I would say that it is a new implementation heavily based on the previous one. So, the problems that I saw with the previous implementation were:

Almost no comments: couldn't easily tell what some parts of the code were doing

Copied from another source: it was copied from an old version of tensorflow

Different signature and semantics: it returned the best output_size box indices, even though some of them might have been suppressed

I believe this implementation checks all of the above-mentioned boxes. I added plenty of comments + a few changes that resulted in a more maintainable code (IMHO).

ysiraichi · 2024-03-27T11:49:39Z

@JackCaoG @vanbasten23 I think this PR is ready. Could you take a look at it when you have some time?

vanbasten23 · 2024-03-29T00:02:15Z

torch_xla/csrc/xla_lower_util.cpp

+      init_values, "BoxSelectionLoop", builder));
+}
+
+xla::XlaOp BuildNms(xla::XlaOp boxes, xla::XlaOp scores,


Is there any resource that you referenced when coming up with the implementation?

I based this implementation in two other implementations:

The old PyTorch/XLA nms implementation

TorchVision CUDA nms implementation

miladm · 2024-04-08T16:46:19Z

test/test_operations.py

+    return boxes, scores
+
+  @skipOnEagerDebug
+  def test_nms_ref(self):


@ysiraichi can you please add a dynamic shape input test scenario for nms? Because this op is dynamic, it's been falling back to CPU. Of course, there is a lot of interest to bring the op to the xla device; though, this requires correct functionality for dynamism.

more clarity: number of boxes is set to 1000 at the moment. we want that number to be dynamic.

to be clear the output of this test appears to be dynamic, though the input number of boxes can also be dynamic for nms. This test, currently, covers one dynamism scenario (i.e. the output dynamism).

ysiraichi requested a review from JackCaoG March 23, 2024 23:39

ysiraichi added the xla:gpu label Mar 23, 2024

ysiraichi mentioned this pull request Mar 25, 2024

Failing Torchbench Models: tracking issue #5932

Open

ysiraichi added 5 commits March 26, 2024 16:53

Kill the old nms function.

ce758e7

Refactor and implement new nms.

4fd57d7

Add tests.

cb29d77

Fix lint issue.

edc269a

Skip functionalization when returning.

be21831

ysiraichi force-pushed the ysiraichi/refactor-into-torchvision-nms branch from a4e90c8 to be21831 Compare March 26, 2024 19:53

ysiraichi added 2 commits March 26, 2024 18:27

Fix test.

33e5760

Fix lint.

cfeb34d

vanbasten23 reviewed Mar 27, 2024

View reviewed changes

vanbasten23 reviewed Mar 29, 2024

View reviewed changes

vanbasten23 approved these changes Mar 29, 2024

View reviewed changes

ysiraichi merged commit 6cf9b91 into master Apr 1, 2024
18 checks passed

miladm reviewed Apr 8, 2024

View reviewed changes

This was referenced Apr 12, 2024

[torchbench] timm_efficientdet inference fails to run. #6899

Closed

Make nms fallback by default. #6933

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `nms` into TorchVision variant. #6814

Refactor `nms` into TorchVision variant. #6814

ysiraichi commented Mar 23, 2024

ysiraichi commented Mar 24, 2024

JackCaoG commented Mar 25, 2024

ysiraichi commented Mar 25, 2024

JackCaoG commented Mar 25, 2024

vanbasten23 Mar 27, 2024

ysiraichi Mar 27, 2024

ysiraichi commented Mar 27, 2024

vanbasten23 Mar 29, 2024

ysiraichi Apr 1, 2024

miladm Apr 8, 2024

miladm Apr 8, 2024

miladm Apr 8, 2024

Refactor nms into TorchVision variant. #6814

Refactor nms into TorchVision variant. #6814

Conversation

ysiraichi commented Mar 23, 2024

ysiraichi commented Mar 24, 2024

JackCaoG commented Mar 25, 2024

ysiraichi commented Mar 25, 2024

JackCaoG commented Mar 25, 2024

vanbasten23 Mar 27, 2024

Choose a reason for hiding this comment

ysiraichi Mar 27, 2024

Choose a reason for hiding this comment

ysiraichi commented Mar 27, 2024

vanbasten23 Mar 29, 2024

Choose a reason for hiding this comment

ysiraichi Apr 1, 2024

Choose a reason for hiding this comment

miladm Apr 8, 2024

Choose a reason for hiding this comment

miladm Apr 8, 2024

Choose a reason for hiding this comment

miladm Apr 8, 2024

Choose a reason for hiding this comment

Refactor `nms` into TorchVision variant. #6814

Refactor `nms` into TorchVision variant. #6814