-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HCI_UART: Hang on high rate of notifications #58236
Comments
Hi @KlemenDEV! We appreciate you submitting your first issue for our open-source project. 🌟 Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙 |
As suggested by @mringwal, here are PKLG files for scenario 1, where nRF52833 runs as peripheral (nRF52833 controller with STM32 with BTStack for the host) and BLE Python client that sends data to peripheral with high data rate and reads responses on another characteristic. HCI dump at the time the nRF52833 hangs (complete dump from the setup, connection establishment, and data sending): hci dump at time of crash.zip HCI dump after resetting host (Zephyr seems to completely hang at this point): later attempt once zephyr hangs.zip I have zipped PKLG files as GitHub does not support PKLG files. |
So I have enabled
and also added another thread that just prints a word every second to main.c to test if the core hangs and here are the findings:
is the last log related to BT and sending any new HCI packets from host to controller does not result in any logs, so some BLE related thread seems to be hanging |
It also seems the nRF52833 is still detected by BLE scanners, so advertisements still run (connecting is ofc not possible) even after the hang, but no further HCI communication seems to be possible. Resetting the host (STM32) makes nRF52833 no longer advertise, and nRF52833 does no longer respond to HCI commands at that point and seems fully stuck (even my custom thread no longer runs) |
I have found out this was a race condition in my code running HCI methods from different threads, probably resulting in wrong HCI commands being sent. This does not change the fact that it seems one can send data via HCI that makes Zephyr irreversibly hang, not sure if this is ok or not so I will leave this issue open for now. |
@KlemenDEV BTstack is not thread-safe. It's recommended to have all Bluetooth logic on a dedicated thread and use a thread-safe IPC queue to trigger Bluetooth actions etc. - please contact us directly or ask on the BTstack mailing list for additional details). Anyway, if the Zephyr HCI firmware still hangs, please post updated pklg / Zephyr debug output files to facilitate analysis. |
I was aware of this, but I had a bug/oversight in my proof-of-concept code where I still did access from another thread. The pklg files from above are from the time Zephyr hangs for the Zephyr developers in case hang should not happen with invalid data. If it is normal for the controller to hang if there is invalid data according to BT spec, then this is probably not a problem at all. |
Thanks for getting back to us. |
Describe the bug
I am trying to use HCI_UART (https://docs.zephyrproject.org/latest/samples/bluetooth/hci_uart/README.html) for the controller on nRF52833 DK (https://www.nordicsemi.com/Products/Development-hardware/nrf52833-dk).
For the host, I am using the STM32L476 chip with BTStack (https://github.com/bluekitchen/btstack).
I have contacted the author of the project for some guidelines and successfully got it working as per bluekitchen/btstack#475. It all seems to work right, I am able to use nRF52833 as both host and peripheral.
To Reproduce
I have observed the following scenarios:
Sending from the Python BLE client to the Python BLE server works, so either nRF52833 as peripheral or as a client needs to be involved for this to happen, ruling out any issues with my test BLE Python scripts.
Reducing the rate lowers the chance of Zephyr/nRF52833 hanging, seems at around 10-20Hz the system operates reliably.
Expected behavior
A 50Hz notification rate should be possible and if not, the HCI transport layer should handle this. I use mechanisms provided by the BTStack library to check on the host if sending data (notification) is possible or not and do not send if the library communicates the system is busy.
I have ruled out this being a bug of BTStack as the reset of the host does not help. A reset of the Zephyr controller is needed for the setup to start working again.
Impact
Using the Zephyr with HCI_UART and high notification rates does not seem possible
Logs and console output
I have enabled RTT logs/console and I can see INF data from the HCI_UART main.c being printed, but when the system hangs, no further logs are printed and no errors are printed either. I am not that familiar with debugging Zephyr or nRF5 MCUs to be able to see if core halts or anything of such nature happens.
Environment (please complete the following information):
More context
Some issues I have found that may be related: #30378, but I don't get any errors so this may be a complete miss.
The text was updated successfully, but these errors were encountered: