Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32-C3 task watchdog trigger #18

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

RaymiiOrg
Copy link
Contributor

For the Olimex ESP32-C3. This was my reference:

https://github.com/esphome/esphome/pull/2096/files

@stas-sl
Copy link
Owner

stas-sl commented Mar 31, 2024

Hi, thanks, I'm currently trying to remember/understand how FreeRTOS/ESPHome executes tasks, which priorities they have and how my code is supposed to work on a single core chip 🤔 And whether your solution is ok.

Do you use arduino/esp-idf? As there could be some differences how WDTs are set up.

If there is only a single core, then you probably have to be more careful choosing task priority. You can configure priority of my component task loop in config. By default it is set to 2, which is lower than all other critical system tasks like WiFi/Bluetooth etc, however if I'm not mistaken ESPHome's main loop runs in a task with priority=1, and Idle task has priority=0. So If you will use these priorities, then your solution will actually help by giving some time to Idle task (1ms), however I'm not sure how it will work if you will be using other components that will require more processing time in the main loop.

@stas-sl
Copy link
Owner

stas-sl commented Mar 31, 2024

The thing is while my component doesn't need 100% CPU to apply all filters/calculations, but it will block in i2s_read, waiting for audio data. So all other tasks with lower priorities wouldn't be executed. One idea is to somehow dynamically adjust delay time depending on audio data available or how much time it is required on average, or maybe just calculate it based on sample rate / buffer size ratio. But it could be tricky. I was hoping that it could be avoided by carefully assigning task priorities and running task on another core. But if you run everything on one core and you have some other potentially blocking components then by choosing lower or higher priority for sound level meter task you might either loose some audio data (if you set its priority to 0 or 1) or there could be issues with your other components running in the main loop (if you keep SLM's priority=2).

@RaymiiOrg
Copy link
Contributor Author

I'm using esp-idf, in #19 I posted the esphome yaml file for reference. The board (https://www.olimex.com/Products/IoT/ESP32-C3/ESP32-C3-DevKit-Lipo/open-source-hardware) specs state the following CPU: RISC-V single core processor runninig on 160 Mhz.

I was trying to run the board with both the sound level meter and a bluetooth proxy (https://esphome.io/components/bluetooth_proxy.html) for home assistant when the error first came up (Task watchdog got triggered. The following tasks did not reset the watchdog in time ). When removing all bluetooth proxy code, the error still came up. For this specific board it seems to help, because with the small delay it does not trigger the log warning. Not sure how it would affect other boards, not C3.

@RaymiiOrg
Copy link
Contributor Author

There is a bit more info in this github issue: esphome/issues#2277 - that's related to the pull request I linked earlier. Might help to put an #ifdef around it just for the ESP32-C3.

@deece
Copy link

deece commented Apr 11, 2024

The thing is while my component doesn't need 100% CPU to apply all filters/calculations, but it will block in i2s_read, waiting for audio data. So all other tasks with lower priorities wouldn't be executed. One idea is to somehow dynamically adjust delay time depending on audio data available or how much time it is required on average, or maybe just calculate it based on sample rate / buffer size ratio. But it could be tricky. I was hoping that it could be avoided by carefully assigning task priorities and running task on another core. But if you run everything on one core and you have some other potentially blocking components then by choosing lower or higher priority for sound level meter task you might either loose some audio data (if you set its priority to 0 or 1) or there could be issues with your other components running in the main loop (if you keep SLM's priority=2).

It looks like if ticks_to_wait is 0, you should have a non-blocking call that you can call in a loop on ESP_ERR_TIMEOUT, and then yield() accordingly?

@stas-sl
Copy link
Owner

stas-sl commented Apr 11, 2024

Well spotted, @deece, I totally forgot about last argument ticks_to_wait in i2s_read function. Though, still I guess, we should call delay(1), not yield(). Then we will receive/process data more often in smaller batches. Currently default buffer size is 1024 and assuming 48000 sample rate, it corresponds to ~20ms of audio data, so currently it blocks for approximately this amount of time to fill the buffer. But if we set ticks_to_wait=0 and add delay(1) on each iteration then i2s_read will return smaller number of samples corresponding to ~1ms of audio data 48000 / 1000 = ~48 samples per iteration. I guess it should be OK, but I need to try and test how it goes.

@stas-sl
Copy link
Owner

stas-sl commented Apr 13, 2024

Hmm, I experimented a bit today with my ESP32 (not ESP32-C3).

ESP32 has 2 CPU cores, however ESPHome's main loop task named loopTask is executed on Core 1 and has priority 1 and by default my sound_level_meter task is also pinned to Core 1 with priority 2 (but you can change these defaults in config). As I understand WDT is configured to watch for loopTask, if it will not receive control for some time, because other tasks with higher priority will keep CPU busy, or for some other reason, the WDT will trigger. As I was using i2s_read with last parameter ticks_to_wait = portMAX_DELAY, one can expect that the loop in sound_level_meter task will keep CPU busy and wouldn't allow lower priority tasks on the same core such as loopTask to be executed. However, seem like this is not the case. First, in my case WDT is not triggered, and everything works OK, second to ensure, I added the following print statement in on_loop handler and also I added few print statements before and after i2s_read:

Pasted Graphic 3

And here is some excerpt from logs:

Pasted Graphic 1

As you can see on_loop is executed many times while i2s_read is waiting for buffer to fill. So, it seems like, while current task is blocked, the scheduler can switch to other tasks with lower priorities. Hence you don't need to actually call delay(1) manually.

However when I set ticks_to_wait = 0 (without adding delay(1)) then I actually get WDT triggered:

Pasted Graphic 2

As you can see from the logs above, on_loop handler is not executed in this case. Which is understandable as we don't have delay(1) in the loop and it runs fast (as i2s_read return immediately if no new data available) keeping CPU always busy.

Sure we can add delay(1) and it will solve the issue. However I don't quite understand why WDT is triggered on ESP32-C3 and not triggered on regular ESP32. As despite ESP32 has 2 cores, still both tasks are executed on a single core (core 1) and i2s_read seems to allow switching to other tasks while waiting for data. And it seems to be more efficient as there will be fewer loop iterations compared to the case if we manually add delay(1) on each iteration, i.e. if we have buffer size =1000 and sample rate = 48000, then we will have 48 loop iterations per second, each taking ~20ms and while waiting these 20ms, other tasks could be executed. However if we set ticks_to_wait = 0 and add delay(1), then there could be up to 1000 iterations/task switches per second.

@deece
Copy link

deece commented Apr 13, 2024

I've got a c3 sitting on my desk at the moment, if you want to push a branch up somewhere, I can test it for you.

@stas-sl
Copy link
Owner

stas-sl commented Apr 13, 2024

I can create some branch, however it will be just a few ESP_LOGD(...) statements before/after i2s_read and on_loop and maybe trying different ticks_to_wait values.

But before that can you just confirm/reproduce the same issue as OP described with current/unmodified version from the main branch with minimal config? And if you can, then what's the detailed error message (what is the name of task for which WDT is triggered)

@stas-sl
Copy link
Owner

stas-sl commented Apr 14, 2024

I created wdt_debug branch. Also you might need to make some modifications in config:

esphome:
# ...
  on_loop:
    then:
      - lambda: 'ESP_LOGD("loop", "on_loop, CPU Core: %u", xPortGetCoreID());' 

external_components:
  - source: github://stas-sl/esphome-sound-level-meter@wdt_debug

sound_level_meter:
# ...
  buffer_size: 10000  # increase buffer size to increase blocking/waiting time

Here are my logs for comparison:
image

The question is whether there will be on_loop print statements between before i2s_read and after i2s_read. I can guess the answer is "no", if WDT is triggered, which means apparently there is a difference how waiting/blocking in i2s_read works on different platforms/esp-idf versions/CPU architectures.

@stas-sl
Copy link
Owner

stas-sl commented Apr 14, 2024

Adding delay(1) probably should work on both platforms, however as I mentioned before, it might be less efficient, as it will cause more context/tasks switches. But maybe it will be ok.

@RaymiiOrg
Copy link
Contributor Author

I created wdt_debug branch. Also you might need to make some modifications in config:

esphome:
# ...
  on_loop:
    then:
      - lambda: 'ESP_LOGD("loop", "on_loop, CPU Core: %u", xPortGetCoreID());' 

external_components:
  - source: github://stas-sl/esphome-sound-level-meter@wdt_debug

sound_level_meter:
# ...
  buffer_size: 10000  # increase buffer size to increase blocking/waiting time

Here are my logs for comparison: image

The question is whether there will be on_loop print statements between before i2s_read and after i2s_read. I can guess the answer is "no", if WDT is triggered, which means apparently there is a difference how waiting/blocking in i2s_read works on different platforms/esp-idf versions/CPU architectures.

If you need me to test again I can do as well. I now also have a few ESP32-S3 I can test, but those are dual core as well (more like the regular ESP32 whereas the ESP32-C3 seems to be more like the esp8266)

@stas-sl
Copy link
Owner

stas-sl commented Apr 14, 2024

@RaymiiOrg, I do have S3 version as well and I tested using it a year ago and AFAIR, it worked pretty well. At least there were no WDT issues.

I also tried today setting ticks_to_wait to lower values (0 or 1) and it resulted in incorrect data/values. I guess you have to give it some time (5ms worked ok in my experiments) to actually get the data.

I checked i2s_read source code, can't say I fully understand what's going on there, but it uses FreeRTOS queues internally (xQueueReceive) and I guess it is supposed to allow switching to others tasks (even with lower priorities) while waiting for new data. So I really don't understand why WDT triggers and for what task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants