dsps_dotprod_s16_ae32 benchmark on ESP32 (DSP-153) #97

boborjan2 · 2024-11-22T18:41:57Z

Answers checklist.

I have read the ESP-DSP documentation and the issue is not addressed there.
I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
I have updated my ESP-DSP branch (master or release) to the latest version and checked that the issue is present there.
I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

According to this page https://docs.espressif.com/projects/esp-dsp/en/latest/esp32/esp-dsp-benchmarks.html, dsps_dotprod_s16_ae32() for len=256 completes in 447 cpu cycles. That is - I guess - 1 cycle/mac + overhead.
However when I try to measure it using the ccount register, I measure over 950.
First I used xthal_get_ccount() and redid the measurement using inline asm (asm volatile("rsr %0,ccount":"=a" (ccount));) that yielded ~the same.
In my setup, the core that runs the test is empty besides this task. I run 16000 measurements and then evaluate the results.
What am I missing? How can I reproduce the stock results?

Thanks,
Viktor

boborjan2 · 2024-11-24T11:59:21Z

Hi, the reason for the slow performance is the misalignemnt of input(s). In real-world scenario (e.g. fir filter) it is common that one of the input vectors is only 2byte-aligned (new samples arrive and dotproduct is performed on history buffer). Optimized code expects inputs to be 4byte-aligned. The penalty is quite huge:
4byte-aligned input: ~440cycles @256 taps (as expected according to reference benchmark)
2byte-aligned input: ~1450 cycles! @256 taps.

In my scenario, every second call to dotprod() is with misaligned buffer, so the average yields ~950cycles as measured above.

This might be a good information to highlight in the docs or benchmark page. I have experienced strangely slow performance previously but it was not a showstopper at that time.

boborjan2 · 2024-11-26T09:32:29Z

Another question regarding dotproduct:
The round value is 0x7fff >> shift. For shift 0, this gives 7ffff. This gets shifted right by 15.
For shift=15, the right round value would be 0x4000. What do I miss?
Also, shift argument specifies a left shift in fact. so should't be the round value (0x4000 << shift)?

espressif-bot added the Status: Opened label Nov 22, 2024

github-actions bot changed the title ~~dsps_dotprod_s16_ae32 benchmark on ESP32~~ dsps_dotprod_s16_ae32 benchmark on ESP32 (DSP-153) Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dsps_dotprod_s16_ae32 benchmark on ESP32 (DSP-153) #97

dsps_dotprod_s16_ae32 benchmark on ESP32 (DSP-153) #97

boborjan2 commented Nov 22, 2024

boborjan2 commented Nov 24, 2024

boborjan2 commented Nov 26, 2024 •

edited

Loading

dsps_dotprod_s16_ae32 benchmark on ESP32 (DSP-153) #97

dsps_dotprod_s16_ae32 benchmark on ESP32 (DSP-153) #97

Comments

boborjan2 commented Nov 22, 2024

Answers checklist.

General issue report

boborjan2 commented Nov 24, 2024

boborjan2 commented Nov 26, 2024 • edited Loading

boborjan2 commented Nov 26, 2024 •

edited

Loading