You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Putting this here (1) as a TODO (2) to get input from others (3) a place to keep some notes
Profiling the texture builtin tests the majority of time is spent in TexelView.fromTexelsAsColors making a texture with random data with a generator and, at a glance, most of the speed is going small typedarray creation (in Chrome) per texel and/or per component and small JS object creation (lots and lots of temps).
There are at least 2 issues
The code takes the u32 hash, divides by 0xFFFFFFFF to get a number from 0.0 to 1.0 and uses that to multiply by the test range for the texture format (0 to 1 for a unorm, -1 to 1 for snorm, 0 to 65535 for 16uint, etc...). It then passes the result to quantize which goes through a kind of deep path of temporaries of creating a texel of the given format in binary and then pulling it back out. So for example a 2bit alpha value should be quantized for 4 values.
The reason the quantization is needed is because the software renderer references the TexelViews so it needs to see the same values as the GPU will see.
It calls whatever code is in fromTexelsAsColors to convert the texel to the binary format.
These are both slow.
Some ideas for speeding it up.
I tried optimizing the quantization by adding custom, per format, quantizers. That's easy for snorm/unorm/uint/sint formats. That made it 40% faster. (1000->600) That still leaves it slow for the remaining formats.
Commenting out the quantizing step takes it from (1000->440)
That effectively leaves the rest to fromTexelAsColors
Ideas:
For snorm/unorm/sint/uint formats we can just use random binary data. All values are valid so there's no reason to do the quantization. We can use TexelView.fromTextureDataByReference to make the TexelViews for the software renderer.
For formats that need quantization (if any) it might be faster to just put the values in and then read them back from the GPU (f16, ufloat, ...)?
The text was updated successfully, but these errors were encountered:
Putting this here (1) as a TODO (2) to get input from others (3) a place to keep some notes
Profiling the texture builtin tests the majority of time is spent in
TexelView.fromTexelsAsColors
making a texture with random data with a generator and, at a glance, most of the speed is going small typedarray creation (in Chrome) per texel and/or per component and small JS object creation (lots and lots of temps).There are at least 2 issues
The code takes the u32 hash, divides by 0xFFFFFFFF to get a number from 0.0 to 1.0 and uses that to multiply by the test range for the texture format (0 to 1 for a unorm, -1 to 1 for snorm, 0 to 65535 for 16uint, etc...). It then passes the result to
quantize
which goes through a kind of deep path of temporaries of creating a texel of the given format in binary and then pulling it back out. So for example a 2bit alpha value should be quantized for 4 values.The reason the quantization is needed is because the software renderer references the TexelViews so it needs to see the same values as the GPU will see.
It calls whatever code is in
fromTexelsAsColors
to convert the texel to the binary format.These are both slow.
Some ideas for speeding it up.
I tried optimizing the quantization by adding custom, per format, quantizers. That's easy for snorm/unorm/uint/sint formats. That made it 40% faster. (1000->600) That still leaves it slow for the remaining formats.
Commenting out the quantizing step takes it from (1000->440)
That effectively leaves the rest to
fromTexelAsColors
Ideas:
For snorm/unorm/sint/uint formats we can just use random binary data. All values are valid so there's no reason to do the quantization. We can use
TexelView.fromTextureDataByReference
to make the TexelViews for the software renderer.For formats that need quantization (if any) it might be faster to just put the values in and then read them back from the GPU (f16, ufloat, ...)?
The text was updated successfully, but these errors were encountered: