Routing table error after long uptime or 3+ nodes #38

2E0PGS · 2020-01-12T15:34:05Z

Great firmware, I just successfully range tested two modules with impressive results on the 868MHz band.

A few improvement ideas

Can we get some kind of status page. I wasn't sure if the other node was connected or in range. The only was I knew was getting a friend to ping me messages back and forth on the web chat while I walked around outside.
Can we drive OLED display that's present on most premade compatible boards from China. Even if it's some simple status and IP address to show it's working.

Cheers

samuk · 2020-01-12T17:31:08Z

Yes both good ideas, the status has been discussed below

#35

Writing scrolling last-sent messages to the OLED has been mentioned on the mailing list.

paidforby · 2020-01-14T20:31:58Z

@samuk is correct, both of these issues have been on my mind.

For what it is worth, the latest firmware (which I will release as 0.1.1 soon) includes a console interface accessed through serial that allows you to print the routing table by typing lr -r, see the testing firmware section of the readme, to see the current abilities of the console interface. The routing table will show "connected" nodes along with the "quality" of the connection in the form the metric.

I'm planning on figuring out a way to display the routing table in the web app.

paidforby · 2020-01-16T05:46:27Z

@2E0PGS checkout new "Active Nodes" list feature mentioned in the related issue, #35 (comment)

Should be working if you build (both the firmware and the web app) from latest, or I will be compiling a pre-built binary for 0.1.1 soon.

No progress on utilizing the OLED screen yet.

2E0PGS · 2020-01-19T18:03:29Z

ok cool thanks!

2E0PGS · 2020-01-25T15:57:03Z

Sorry it took me a while to try v0.1.1 just flashed it on my two boards. Working great!
I can now see the hops and metric.

I presume node 000000000000 is the local node's address? hops 00 and metric 00

2E0PGS · 2020-01-26T12:48:10Z

It looks like there maybe a bug with the beacon length when the device is left on for a long time.

2020-01-25 17:20:31

2020-01-26 00:46:40

I didn't change any settings in GQRX.

The only changes I can think of maybe room temperature, laptop warming up, LoRa warming up, and HackRF warming up.

Version 0.1.1 from the binary release.

I had two nodes running there. Oddly enough unplugging and replugging didn't reset it.

However back this morning with a cold room and cold devices (switch off over night) they're back to how it was in the first screenshot.

2E0PGS · 2020-01-26T12:54:42Z

I will try and replicate it by artificially heating up my board. Or leaving one on and one off and compare after hours.

2E0PGS · 2020-01-26T13:01:16Z

No sudden changes from artificially heating my SDR or my LoRa board. I am using TTGO.

I will try leaving one running for now. Then I can turn the other on later and compare.

2E0PGS · 2020-01-26T13:06:58Z

The only code references I see are these two: https://github.com/search?q=org%3Asudomesh+beaconInterval&type=Code

2E0PGS · 2020-01-26T13:09:16Z

Or the glitch relates to the route message getting longer. Android phone on the WiFi slowing it down?

2E0PGS · 2020-01-26T22:54:10Z

I did some testing today. Here are the results.

During all of this testing GQRX was not modified settings wise. I am running two v0.1.1 firmware on TTGO boards from prebuilt binaries.

Ignore the extra harmonics this is due to one board being powered via a grounded mains to 5v PSU and the second via battery power bank. The second lagging signal is the power bank TTGO we shall call this node 2.

2020-01-26 17:28:28 "Receiver Options"

This is the beginning of the test and I show a few setting windows.

2020-01-26 17:28:38 "FFT Settings"

2020-01-26 17:28:42 "Input controls"

2020-01-26 22:03:16

Several hours into testing I notice a increase in the signal TX length.

2020-01-26 22:33:48

I take power cycle one of the boards to see if this changes it's TX length. It makes no change.

2020-01-26 22:35:50

I decide to try power cycle both boards to see if the issue is related to packets exchanged between the two, maybe routing information. This resolves the problem.

2E0PGS · 2020-01-27T12:51:14Z

Running one node on it's own for hours with no neighbours didn't have this behavior. This makes me think it's route message related.

samuk · 2020-01-27T14:49:24Z

Interesting stuff, wonder if it's worth testing with the latest code? Realise not that much has changed, but might be worth verifying it's still an issue?

paidforby · 2020-01-28T03:56:20Z

Highly likely that there may be an unknown error with the routing message logic that only appears after a long uptime. My guess is that a byte gets shifted somewhere and starts filling the routing table with false routes. This would explain why it doesn't go away after only one node is restarted, because the node that was kept on immediately shares those false routes with the rebooted node. However, when both are rebooted, their routing tables are reset and their little network "forgets" about the false routes.

Note: this is just my theory, I would need to do some actual testing and write some debugging code to demonstrate that this is happening.

tlrobinson · 2020-01-28T03:59:59Z

I don’t know if this is related, but I was seeing an issue where the routing table exploded (dozens of new entries per minute) if I had 3+ nodes running. I’ll try to reproduce it.

…

On Mon, Jan 27, 2020 at 7:56 PM grant_____ ***@***.***> wrote: Highly likely that there may be an unknown error with the routing message logic that only appears after a long uptime. My guess is that a byte gets shifted somewhere and starts filling the routing table with false routes. This would explain why it doesn't go away after only one node is restarted, because the node that was kept on immediately shares those false routes with the rebooted node. However, when both are rebooted, their routing tables are reset and their little network "forgets" about the false routes. Note: this is just my theory, I would need to do some actual testing and write some debugging code to demonstrate that this is happening. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#38?email_source=notifications&email_token=AAAEOEJDOBL7B4HIKGEEKG3Q76UGJA5CNFSM4KFYIP3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKB56LA#issuecomment-579067692>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAEOEODYW6YK4A6PRRXYLTQ76UGJANCNFSM4KFYIP3A> .

2E0PGS · 2020-02-13T21:11:51Z

Ref the hypothesis, this sounds about right to me. I suspect it's filling up and this causes a knock on effect of a longer TX length as the message is longer.

samuk · 2020-05-13T20:09:29Z

Would you be up for trying to replicate your error with the latest routing? Hoping this bug has just gone away: #57 (comment)

paidforby · 2020-05-14T17:54:46Z

Yes, it would be good to test if this bug is resolved on the 1.0.0-rc.2 branch, which is using the latest updates to LoRaLayer2, which has switched to a more dynamic source routing (DSR) style and no longer requires that sharing of routing tables via routing table packets.

…domesh/disaster-radio#38

paidforby · 2020-09-21T21:19:33Z

Closing this issue and merging it with #81 since there is more activity on that thread and these seem closely (if not directly) related issues.

paidforby pushed a commit that referenced this issue Jan 16, 2020

add routing table display to web app, related to #35 and #38

4095501

samuk added the bug label Feb 13, 2020

paidforby changed the title ~~Improvement ideas~~ Routing table error after long uptime or 3+ nodes Apr 9, 2020

samuk mentioned this issue May 13, 2020

Alternative mesh routing protocol options #57

Open

paidforby pushed a commit to sudomesh/LoRaLayer2 that referenced this issue May 14, 2020

correct error in check neighbor function, may have been related to su…

bc9c81b

…domesh/disaster-radio#38

paidforby closed this as completed Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Routing table error after long uptime or 3+ nodes #38

Routing table error after long uptime or 3+ nodes #38

2E0PGS commented Jan 12, 2020

samuk commented Jan 12, 2020

paidforby commented Jan 14, 2020

paidforby commented Jan 16, 2020

2E0PGS commented Jan 19, 2020

2E0PGS commented Jan 25, 2020

2E0PGS commented Jan 26, 2020

2E0PGS commented Jan 26, 2020

2E0PGS commented Jan 26, 2020

2E0PGS commented Jan 26, 2020

2E0PGS commented Jan 26, 2020

2E0PGS commented Jan 26, 2020

2E0PGS commented Jan 27, 2020

samuk commented Jan 27, 2020

paidforby commented Jan 28, 2020

tlrobinson commented Jan 28, 2020 via email

2E0PGS commented Feb 13, 2020

samuk commented May 13, 2020

paidforby commented May 14, 2020

paidforby commented Sep 21, 2020

Routing table error after long uptime or 3+ nodes #38

Routing table error after long uptime or 3+ nodes #38

Comments

2E0PGS commented Jan 12, 2020

samuk commented Jan 12, 2020

paidforby commented Jan 14, 2020

paidforby commented Jan 16, 2020

2E0PGS commented Jan 19, 2020

2E0PGS commented Jan 25, 2020

2E0PGS commented Jan 26, 2020

2020-01-25 17:20:31

2020-01-26 00:46:40

2E0PGS commented Jan 26, 2020

2E0PGS commented Jan 26, 2020

2E0PGS commented Jan 26, 2020

2E0PGS commented Jan 26, 2020

2E0PGS commented Jan 26, 2020

I did some testing today. Here are the results.

2020-01-26 17:28:28 "Receiver Options"

2020-01-26 17:28:38 "FFT Settings"

2020-01-26 17:28:42 "Input controls"

2020-01-26 22:03:16

2020-01-26 22:33:48

2020-01-26 22:35:50

2E0PGS commented Jan 27, 2020

samuk commented Jan 27, 2020

paidforby commented Jan 28, 2020

tlrobinson commented Jan 28, 2020 via email

2E0PGS commented Feb 13, 2020

samuk commented May 13, 2020

paidforby commented May 14, 2020

paidforby commented Sep 21, 2020