Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiFi (specifically WifiGeneric) handling of events is subject to race conditions #6947

Open
1 task done
egnor opened this issue Jul 6, 2022 · 3 comments
Open
1 task done
Labels
Area: BT&Wifi BT & Wifi related issues Type: Documentation Issue pertains to Documentation of Arduino ESP32 Type: For reference Common questions & problems

Comments

@egnor
Copy link
Contributor

egnor commented Jul 6, 2022

Board

any board (with networking)

Device Description

any device (with networking)

Hardware Configuration

no special configuration

Version

latest development Release Candidate (RC-X)

IDE Name

any IDE

Operating System

any OS

Flash frequency

any frequency

PSRAM enabled

yes

Upload speed

any speed

Description

It's hard to write a specific repro case for this since it depends on race conditions. (Also, my understanding may be off, please correct!)

The WiFi instance allows user code to register for events with WiFi.onEvent(). This isn't documented, but it is present in most example code, described in various tutorials and copied into many/most network-using sketches. This is implemented the WiFiGeneric class (one of the base classes for WiFiClass, of which WiFi is an instance) using an independently running FreeRTOS task (pinned to core 0(?) by default):

static void _arduino_event_task(void * arg){

That free-running task reads from a queue and invokes WiFiGenericClass::_eventCallback():

esp_err_t WiFiGenericClass::_eventCallback(arduino_event_t *event)

After translating the event, _eventCallback() loops through all registered callbacks and invokes them serially:

for(uint32_t i = 0; i < cbEventList.size(); i++) {

Nowhere in this process is locking or other synchronization done, that I can figure out. I see these implications?

  1. If code adds/removes callbacks at the same time as callbacks are being invoked, arbitrarily bad undefined behavior can result due to conflict over cbEventList. Most code will add callbacks at the start and leave them there forever, but if that's the expectation it seems worth documenting?
  2. User callbacks are invoked on a separate task/thread (the "arduino event thread") from the user's main loop. That means the user should be careful about synchronization between those callbacks and the user's main code. However, this isn't documented, and example code (e.g. WiFiClientEvents.ino, WiFiUDPClient.ino, ETH_LAN8720, etc) make lots of unsynchronized calls and global variable changes in ways that seem unsafe.

(Hopefully you'll tell me why this is all wrong and secretly it's all serialized so the callbacks only happen between calls to loop() or something like that...?)

Sketch

(many of the example sketches, see above)

Debug Message

(no specific debug logs)

Other Steps to Reproduce

No response

I have checked existing issues, online documentation and the Troubleshooting Guide

  • I confirm I have checked existing issues, online documentation and Troubleshooting guide.
@egnor egnor added the Status: Awaiting triage Issue is waiting for triage label Jul 6, 2022
@TD-er
Copy link
Contributor

TD-er commented Jul 7, 2022

(Hopefully you'll tell me why this is all wrong and secretly it's all serialized so the callbacks only happen between calls to loop() or something like that...?)

Nope, there are for sure race condition bugs, both on ESP32 and ESP8266.
But it is really hard to pinpoint where they are.

@VojtechBartoska VojtechBartoska added the Area: BT&Wifi BT & Wifi related issues label Jul 11, 2022
@mrengineer7777 mrengineer7777 added Type: For reference Common questions & problems Type: Documentation Issue pertains to Documentation of Arduino ESP32 and removed Status: Awaiting triage Issue is waiting for triage labels Apr 13, 2023
@mrengineer7777
Copy link
Collaborator

Your analysis of the library seems sound.
You are indeed correct that removing a callback while _eventCallback() is iterating through the loop could cause a crash, unless cbEventList.size() recalculates for every iteration. Probably does recalculate - could be tested.
_eventCallback does indeed run on a separate thread.

The developers are quite busy working on the 3.0.0 release.

Please submit a PR to improve the documentation. These are worth telling others about it.

@egnor
Copy link
Contributor Author

egnor commented Apr 16, 2023

#8081 submitted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: BT&Wifi BT & Wifi related issues Type: Documentation Issue pertains to Documentation of Arduino ESP32 Type: For reference Common questions & problems
Projects
None yet
Development

No branches or pull requests

4 participants