Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved grid_sample #544

Open
AlexanderLutsenko opened this issue Nov 20, 2023 · 1 comment
Open

Improved grid_sample #544

AlexanderLutsenko opened this issue Nov 20, 2023 · 1 comment
Labels
TODO TODO

Comments

@AlexanderLutsenko
Copy link

Issue Type

Others

OS

Linux

onnx2tf version number

1.18.14

onnx version number

onnxruntime version number

onnxsim (onnx_simplifier) version number

tensorflow version number

2.14

Download URL for ONNX

Parameter Replacement JSON

-

Description

TL;DR: I believe grid_sample is being converted incorrectly, plus it can be made much smaller in size and ~5x faster.

So I went on a search for a better Tensorflow substitute of grid_sample, found some interesting stuff here: #426

The bug

The problem occurs with padding_mode='zero' when pixel index goes out of image bounds by less than a whole pixel. Consider this one-dimensional example:

Let x = -0.4
O[x] = I[x0]*0.6 + I[x1]*0.4, where x0 = -1, x1 = 0, and I[-1] = 0 as an out-of-bounds pixel.
Instead, the current code sets the entire O[x] to 0.

I think the best method to do this right is to zero-pad input image by one pixel to each side and add 1 to all pixel indices.
The expensive post-processing phase becomes unnecessary.

Broken TFLiteConverter

The way Tensorflow converts gather_nd to TFLite is completely broken. Not only is it offensively slow, it also adds this suspicious Concatenation op with a massive tensor of zeros inside.

cat

But 1D gather seems to be alright, so that's what I ended up using.

def gather(input, y, x, b, h, w, c, padding_mode):
    # Slow!
    # return tf.gather_nd(params=input, indices=tf.cast(tf.concat([y, x], axis=-1), dtype=tf.int32), batch_dims=1)

    if padding_mode == 'zeros':
        w_padded = w + 2
        h_padded = h + 2
        linear_coordinates = tf.cast(y * w_padded + x, dtype=tf.int32)
        linear_coordinates = tf.reshape(linear_coordinates, shape=(b, h, w))
        input = tf.reshape(input, shape=(b, h_padded * w_padded, c))
    else:
        linear_coordinates = tf.cast(y * w + x, dtype=tf.int32)
        linear_coordinates = tf.reshape(linear_coordinates, shape=(b, h, w))
        input = tf.reshape(input, shape=(b, h * w, c))

    out = tf.gather(params=input, indices=linear_coordinates, batch_dims=1)
    return out

Full code:
https://github.com/AlexanderLutsenko/nobuco/blob/aa4745e6abb1124d90f7d3ace6d282f923f08a40/nobuco/node_converters/grid_sampling.py#L38

Correctness tests:
https://github.com/AlexanderLutsenko/nobuco/blob/aa4745e6abb1124d90f7d3ace6d282f923f08a40/examples/grid_samplers.py

Benchmark results, Snapdragon 662:

name size XNNPACK avg
1x3x32x32_grid_sampler_new.tflite 0.0092 Mb 0.2618 ms
1x3x32x32_grid_sampler_old.tflite 0.0177 Mb 1.3888 ms
1x3x64x64_grid_sampler_new.tflite 0.0093 Mb 1.0207 ms
1x3x64x64_grid_sampler_old.tflite 0.0424 Mb 5.5717 ms
1x3x128x128_grid_sampler_new.tflite 0.0094 Mb 4.1274 ms
1x3x128x128_grid_sampler_old.tflite 0.1407 Mb 22.2125 ms
4x3x32x32_grid_sampler_new.tflite 0.0094 Mb 1.0212 ms
4x3x32x32_grid_sampler_old.tflite 0.0424 Mb 5.5643 ms
4x3x64x64_grid_sampler_new.tflite 0.0094 Mb 4.1527 ms
4x3x64x64_grid_sampler_old.tflite 0.1407 Mb 22.2211 ms
4x3x128x128_grid_sampler_new.tflite 0.0094 Mb 17.3625 ms
4x3x128x128_grid_sampler_old.tflite 0.5340 Mb 89.6066 ms
8x3x32x32_grid_sampler_new.tflite 0.0094 Mb 2.0717 ms
8x3x32x32_grid_sampler_old.tflite 0.0752 Mb 11.0984 ms
8x3x64x64_grid_sampler_new.tflite 0.0094 Mb 8.4565 ms
8x3x64x64_grid_sampler_old.tflite 0.2718 Mb 44.5216 ms
8x3x128x128_grid_sampler_new.tflite 0.0094 Mb 36.0839 ms
8x3x128x128_grid_sampler_old.tflite 1.0583 Mb 178.9390 ms
@PINTO0309 PINTO0309 added the TODO TODO label Nov 20, 2023
@PINTO0309
Copy link
Owner

Excellent. Thank you.

Right now, I am concentrating on creating my own high precision model. (Work not related to converters)
I will apply it when I have more time in my private life.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO TODO
Projects
None yet
Development

No branches or pull requests

2 participants