Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blue_wizard compatibility broken #234

Open
wizarddata opened this issue Mar 29, 2021 · 56 comments
Open

blue_wizard compatibility broken #234

wizarddata opened this issue Mar 29, 2021 · 56 comments

Comments

@wizarddata
Copy link
Contributor

wizarddata commented Mar 29, 2021

Describe the bug
Compiling and flashing the blue_wizard keyboard firmware causes the keyboard to continually connect to & disconnect from PC.
Board continues to broadcast between disconnect/connect. No errors are produced during compiling.

To Reproduce
Steps to reproduce the behavior:
Working with PR228. Also works with PR185, PR214.

Not working with PR229. Also not working with PR186.

Expected behavior
Expected behavior is for the keyboard to successfully broadcast, connect to the PC, and send keycodes.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Ubuntu
  • Browser Chrome
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context

Keyboard uses nrf52832 MCU.

During compiling, arduino IDE reports 35% program space and 54% dynamic memory used.

Matrix on this keyboard is 15x7. This large size caused memory issues unless MAX_NO_LAYERS is reduced to 5.

It was also discovered that TG(Layer) only seems to function on layers 0 and 1. This may or may not be related.

@wizarddata
Copy link
Contributor Author

wizarddata commented Mar 30, 2021

I believe I've narrowed it down to between PR228 and PR229. I'm going to verify this in the morning.

@wizarddata
Copy link
Contributor Author

I've been able to much better describe the behavior this morning with fresh eyes.

@jpconstantineau
Copy link
Owner

If the issue is related to connecting/disconnecting continuously, can you "forget" from your computer and try connecting again? This will make the two re-negociate the pairing keys.

@wizarddata
Copy link
Contributor Author

Yes, I've been doing a forget and re-pairing as part of my process for testing each pull request in case that was the issue. Just did that again to confirm.

@jpconstantineau
Copy link
Owner

One last thing: Does it still happen if you reformat the nrf52 file system. Run this sketch:
image
and re-flash the firmware.
This will wipe out any keys and settings saved to file on the nrf52 as well.

@wizarddata
Copy link
Contributor Author

After running the pictured sketch and reflashing firmware built with PR229, the connection resets the same way.

@jpconstantineau
Copy link
Owner

Ok. What about with 228? I doubt that the changes brought by 229 will impact much, perhaps except for memory usage.

@wizarddata
Copy link
Contributor Author

After the format, with pr228 the connection is persistent and the keyboard functions correctly.

It is possible that the device is resetting when the connection is lost and I'm misinterpreting the boadcasting as being continuous because the device is booting quickly. I'm using the nRFConnect app on my phone and the dispaly only refreshes about once a second.

@jpconstantineau
Copy link
Owner

That's good. let's see... Let's try this commit: ee8c80a
It's skips ahead a few commits I did to add a workflow on github actions and troubleshoot it.
I expect that one to work the same... Let me know...

@wizarddata
Copy link
Contributor Author

That commit does work correctly. Stepping to the next in line, f012b88
brings back the disconnection behavior.

@wizarddata
Copy link
Contributor Author

I've also flashed my board with the default 'Ergotravel' configuration. That produced the same results.

@jpconstantineau
Copy link
Owner

Ergotravel too? I wonder about the luddite or the 4x4Macropad. I have been looking too much at the 840 these days... I have a 832 macropad I have been using and that one is ok. I'll give a try to a few 832 boards I have around.

@jpconstantineau
Copy link
Owner

I assume you tried the one in between: 6f21844

@wizarddata
Copy link
Contributor Author

My apologies, I'm jumping between the git software I use and the github web interface and I grabbed the wrong pr. The one you linked is the PR that brings about the problems.

@jpconstantineau
Copy link
Owner

6f has issues? and the one before didnt.

@wizarddata
Copy link
Contributor Author

I've been experiencing some growing pains with the software I'm using, I was looking at the incorrect field. To summarize:

6f21844 is working as intended
f012b88 bring about the connectivity / rebooting issues

@jpconstantineau
Copy link
Owner

That's what I was afraid. That's the commit I did all that clang reformatting. I was hoping a big feature to have brought up the issue but this commit was all about formatting. Needle in a haystack unfortunately.

@wizarddata
Copy link
Contributor Author

using the 4x4 backpack firmware does result in a stable connection on f012b88

@wizarddata
Copy link
Contributor Author

For now I'll just create a fork at an earlier commit and add a note to the project where to find it. Maybe we'll have an epiphany someday.

@jpconstantineau
Copy link
Owner

I just checked with my contra (4x12) on an 832 on the latest release (most up to date) and it compiled, flashed and connected fine. Issue is likely related to memory usage somewhere... Ergotravel memory footprint is much higher since it has to handle 2 BLE connections.
Do you have a test keymap with a single layer? Might be worth giving that a try.

@wizarddata
Copy link
Contributor Author

I'll create one and give it a try

@jpconstantineau
Copy link
Owner

Ok. Something as simple as the 4x4tutorial base keymap.

@wizarddata
Copy link
Contributor Author

Bringing it down to one layer and reducing MAX_NO_LAYERS had the same behavior. Must be something to do with the matrix size? It's odd because the IDE reports plenty of space in memory.

@jpconstantineau
Copy link
Owner

Yes, it does report plenty of space but as the matrix is an array of array of keys, and each key contain two arrays of arrays, one for keycodes, the other for durations, each for different activations and layers. All these arrays are not vectors (dynamic size but static size). However, that's a whole lot of memory for your 7x15. I would think that these are allocated at compile time.

I wonder if it crashes in the setup or in the main loop. Perhaps we could add a simple LED turn on when we start the setup and turn it off when finishing the setup...

Since it's a custom board, do you have serial on board? I would like to check on memory space. There are a few commands I could setup to see the data...

@jpconstantineau
Copy link
Owner

If you have serial, you can get into the debug cli and send the i command
on my contra, get this:


| Name | Addr 0x2000xxxx | Usage |
| ---------------------------------------------|
| Stack| 0xF800 - 0xFFFF | 808 / 2048 (39%) |
| Heap | 0x9A58 - 0xF7FF | 22212 / 23976 (92%) |
| Bss | 0x3600 - 0x9A57 | 25688 |
| SD | 0x0000 - 0x35FF | 13824 |
|______________________________________________|

That's not a lot of heap left...

@wizarddata
Copy link
Contributor Author

The only serial I broke out was for the CP2104, I don't immediately know exactly what I can do with that but I'll dig into it and see what I've got.

@wizarddata
Copy link
Contributor Author

While I've got the serial monitor up, I get the nice bluemicro ascii art, but the board periodically resets and I'm not able to issue any commands.

@wizarddata
Copy link
Contributor Author

This is on the working firmware.

@wizarddata
Copy link
Contributor Author

Here we go, I timed it between resets.

Device ID : 54797D3ECDAB831B

MCU Variant: nRF52832 0x41414530
Memory : Flash = 512 KB, RAM = 64 KB
Keyboard Name : Blue Wizard
Keyboard Model : Blue Wizard
Keyboard Mfg : awells

Device Power : 0.000000
Filter RSSI : -90
Type RSSI name
cent 0
prph -61 DESKTOP-9ILB3J0
cccd 0

BSP Library : 0.21.0
Bootloader : s132 6.1.1
Serial No : CDAB831B54797D3E


| Name | Addr 0x2000xxxx | Usage |
| ---------------------------------------------|
| Stack| 0xF800 - 0xFFFF | 736 / 2048 (35%) |
| Heap | 0xA638 - 0xF7FF | 18016 / 20936 (86%) |
| Bss | 0x3600 - 0xA637 | 28728 |
| SD | 0x0000 - 0x35FF | 13824 |
|______________________________________________|

Task State Prio StackLeft Num

loop X 1 658 1
IDLE R 0 26 3
Tmr Svc B 2 144 4
BLE B 3 1038 5
Callbac B 2 687 2
SOC B 3 163 6

@jpconstantineau
Copy link
Owner

It resets when connected and before you have a chance to send a command?
This means it's in the loop, setup as passed.

@jpconstantineau
Copy link
Owner

Are you still on the adafruit bsp? That could perhaps make a difference... can you switch over to the Community BSP?

@jpconstantineau
Copy link
Owner

http://bluemicro.jpconstantineau.com/docs/tools
Second url you need to add to the preferences and you can download the community BSP.

@wizarddata
Copy link
Contributor Author

That did change behavior, but unfortunately it only resets faster. The community BSP seems to bring the connectivity problem back to 6f21844 as well.

@wizarddata
Copy link
Contributor Author

There may be a problem with how I'm set up, I'm going to revisit that again in the morning.

@jpconstantineau
Copy link
Owner

I was able to replicate your issue with the latest on my contra (with the 7x15 config you have). It might not be a keyboard properly setup but at least I see the reboot issue you have. I'll see if I can turn off things in a separate branch and make it work.

@jpconstantineau
Copy link
Owner

I did turn off the scanning timer and that resolved thee reboot issue. I'll see if the timer task has enough memory.

@jpconstantineau
Copy link
Owner

I may have found something... Really not what I thinking... It seems that it doesn't even make it to the end of setup before it crashes. As such, I'll re-arrange a few things in there to see if that helps. I'll send you a branch you can test with...

@jpconstantineau
Copy link
Owner

Didn't really find anything. thought I had identified the problem area but no success. I'll have to start removing stuff until it starts working. The frequency at which it reboots isn't really indicative of the severity of the problem... I moved all tasks in the same loop and the only thing that did is change the frequency of the crashes. Sometimes moving a couple of things around only moves the last line it runs before it crashes. I still suspect it's memory related but hunting it down will be the challenge...

@wizarddata
Copy link
Contributor Author

I'll continue to spend time working with it, I'll update as soon as I find anything useful.

@jpconstantineau
Copy link
Owner

jpconstantineau commented Mar 31, 2021

I have been slowly going through the commits and looking at memory usage. I was at 99% heap usage with Max layers was at 5 and that made the board crash fetching memory usage data. I brought it down to 4, and that helped bring heap usage down to 90%. It's definitely memory related. I'll have to do some research on c++ heap management...

@jpconstantineau
Copy link
Owner

Have a try at the nrf52832-revert branch. I reverted the problematic commit and changed the number of layers from 5 to 4. This appears to work ok here but really need to have it tested on your board.

@wizarddata
Copy link
Contributor Author

Thanks for that, I'll give it a go. I've somehow got both of my test boards to a state where they crash-loop regardless of the firmware version I use, so I need to spend a minute to sort out how that happened.

@wizarddata
Copy link
Contributor Author

Initial tests show that repo to be working correctly with MAX_NO_LAYERS set to 5. I'm going to revisit in the morning to ensure I haven't got something wrong on my end to give a false positive.

@wizarddata
Copy link
Contributor Author

After a fullerase and flash, both my test boards are working correctly on nrd52832-revert. The board will still enter a crash loop if MAX_NO_LAYERS isn't set to one larger than the number of layers implimented, but that doesn't seem like a real issue. Did the difference end up being just the clang format or did you have to make other changes?

@jpconstantineau
Copy link
Owner

As far as differences, I didn't try to scan and sort out clang-format vs other changes. With so many file changes in that commit, I'll need to see if something else made it; however, I don't think so. It does worry me that a format change caused the issue; not something one would expect.

@wizarddata
Copy link
Contributor Author

Just in case this helps narrow down the problem, the board will still occasionally crash when it's been asleep for more than 10 or 15 minutes. A reset solves the problem. This is with four layers and MAX_NO_LAYERS set to 5. I'm going to trim that down to 3 and see if that changes.

@wizarddata
Copy link
Contributor Author

After further testing, I am still having an issue where, occasionally, after working with the nrf52832-revert branch I'll get stuck in a way that older bulids will crash in the same way as the newer ones until I perform a fullerase and reflash.

@jpconstantineau
Copy link
Owner

It gets stuck in a bootloop? Is that with number of layers at 4? or back to 5? I'll be doing a full clang-format from the previous commit and compare with the problematic one to see if something else was added that causes the main issue.
I assume you don't use combos. I'll add some logic to not include these (will free some space) and I'll probably refactor key.cpp to use a struct instead of 2 pairs. That should help free some more memory.

@wizarddata
Copy link
Contributor Author

It has occurred with MAX_NO_LAYERS set to both 4 and 5. I wish I could describe the symptoms more concisely but the behavior has been inconsistent enough that I'm having trouble nailing it down.

I don't use combos myself, but I can't necessarily speak for the users of the other half dozen of there that are out there so far. At the end of the day though, I think most reasonable would agree that stable operation would take priority over other features like combos.

@jpconstantineau
Copy link
Owner

I agree that stability is more critical than features that not everyone uses. On nrf52840s, the amount of RAM is so much larger that pretty much everything will fit. I'll probably be putting back my 832 on my luddite and see how much ram i have left and see if it's stable or not.

With the max layers set at 5, I saw 99% heap usage. That's way too high for comfort...

@jpconstantineau
Copy link
Owner

I have added a ENABLE_COMBOS define that can be used to enable functionality. It's now off by default. Let me know if that helps a bit. I'll add a few more enables to make sure that not all functionality is compiled out of the box for 832 boards.

@wizarddata
Copy link
Contributor Author

Initial results are promising. There is still intermittent crashing while the debug cli is open but everything seems fine during normal use.

@jpconstantineau
Copy link
Owner

Debug CLI: that's good. Let me see if it would get "stuck" in there and if the watchdog timer would reset it.
I'll add some more feature enable defines to turn off stuff.

@wizarddata
Copy link
Contributor Author

More testing over the last couple days hasn't revealed any more instabilities.

@jpconstantineau
Copy link
Owner

I just added a watchdog timer reset in the CLI but I think the issues were again related to memory as opposed to WDT timing out.
I'll be turning off audio by default too. I was looking into other methods to instantiate a few classes but that needed C++17 which isn't enabled and caused too many headaches.

I'll be looking to see if a refactor of the key class might help too.

@wizarddata
Copy link
Contributor Author

Just for another data point, I've started occasionally seeing an issue where, after the keyboard either goes to sleep or is powered off, it will occasionally enter a connect-reset loop until powered off, 'forgotten' from the PC, and reconnected. This happens with either 2 or 4 layers.

@github-staff github-staff deleted a comment from adigunsherif Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants