Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ESP32 examples do not connect to zenohd #363

Closed
meganukebmp opened this issue Feb 29, 2024 · 8 comments
Closed

[Bug] ESP32 examples do not connect to zenohd #363

meganukebmp opened this issue Feb 29, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@meganukebmp
Copy link

meganukebmp commented Feb 29, 2024

Describe the bug

I am running the ESP32 examples. The ESP32 connects to the WiFi fine and attempts to open a Zenoh session to zenod. It fails. Both ESP-IDF and Arduino framework examples tested, behaving identically.

Looking at zenod debug log reveals this

[2024-02-29T19:56:03Z DEBUG zenoh_link_udp::unicast] Accepted UDP connection on 192.168.1.176:7447: 192.168.1.225:58999
[2024-02-29T19:56:13Z DEBUG zenoh_transport::unicast::manager] future has timed out

To verify this isn't a networking issue, I built the zeno-pico unix examples on a different linux computer than the one running zenohd. The examples are successfully able to connect to zenohd. Sometimes the connection fails randomly.

nexrem@debian:~/zenoh-pico/examples/build$ ./z_info -e udp/192.168.1.176:7447 -m client
Opening session...
Unable to open session!
nexrem@debian:~/zenoh-pico/examples/build$ ./z_info -e udp/192.168.1.176:7447 -m client
Opening session...
Own ID: 6C663D38BA3A72F72F245AE22E320AE4
Routers IDs:
 7AE577320E5283E2E5AEFABF73064CE6
Peers IDs:

It seems that the ESP32 hits zenoh router but fails to do anything further.

Things I've tried:

  • Reducing memory footprint by reducing the buffer sizes
  • Disabling unnecessary features
  • Different ESP32 units (different dev boards)

To reproduce

  1. Run zenohd -l udp/router_hostname:7447
  2. Run ESP32 publisher example in client mode with the CONNECT string set as udp/router_hostname:7447
  3. Monitor zenohd as well as serial log.

System info

zenohd - v0.10.1-rc
zenoh-pico v0.10.1-rc
Debian 12
WT32-ETH0 & LOLIN32 esp-32

@meganukebmp meganukebmp added the bug Something isn't working label Feb 29, 2024
@cguimaraes
Copy link
Member

If possible, enable debug logs which might provide a bit of more information on what is hapenning.

@meganukebmp
Copy link
Author

If possible, enable debug logs which might provide a bit of more information on what is hapenning.

Hello, here are logs from the ESP with debug log level

Connecting to WiFi...
OK
Opening Zenoh Session...
[1970-01-01T00:00:03Z INFO ::_z_unicast_open_client] Sending Z_INIT(Syn)
[1970-01-01T00:00:03Z DEBUG ::_z_init_encode] Encoding _Z_MID_T_INIT
Unable to open session!

@meganukebmp
Copy link
Author

Here's TCP session log as well

Zenoh-pico

Connecting to WiFi...
OK
Opening Zenoh Session...
[1970-01-01T00:00:06Z INFO ::_z_unicast_open_client] Sending Z_INIT(Syn)
[1970-01-01T00:00:06Z DEBUG ::_z_init_encode] Encoding _Z_MID_T_INIT
[1970-01-01T00:00:06Z DEBUG ::_z_init_decode] Decoding _Z_MID_T_INIT
[1970-01-01T00:00:06Z INFO ::_z_unicast_open_client] Received Z_INIT(Ack)
[1970-01-01T00:00:06Z INFO ::_z_unicast_open_client] Sending Z_OPEN(Syn)
[1970-01-01T00:00:06Z DEBUG ::_z_open_encode] Encoding _Z_MID_T_OPEN
[1970-01-01T00:00:06Z DEBUG ::_z_open_decode] Decoding _Z_MID_T_OPEN
[1970-01-01T00:00:06Z INFO ::_z_unicast_open_client] Received Z_OPEN(Ack)
Unable to open session!

Zenohd

[2024-03-04T20:04:12Z DEBUG zenoh_link_tcp::unicast] Accepted TCP connection on 192.168.1.220:7447: 192.168.1.209:62679
[2024-03-04T20:04:12Z DEBUG zenoh_transport::unicast::manager] Will use Universal transport!
[2024-03-04T20:04:12Z DEBUG zenoh::net::routing::router] New Face{17, 82c874915592a685885513c05104fd95}
[2024-03-04T20:04:12Z DEBUG zenoh_transport::unicast::manager] New transport opened between d91b1345aea9fed22245b7807e2e3c94 and 82c874915592a685885513c05104fd95 - whatami: client, sn resolution: U32, initial sn: 54414960, qos: false, multilink: false, lowlatency: false
[2024-03-04T20:04:12Z DEBUG zenoh_transport::unicast::establishment::accept] New transport link accepted from 82c874915592a685885513c05104fd95 to d91b1345aea9fed22245b7807e2e3c94: TransportLinkUnicast { link: Link { src: tcp/192.168.1.220:7447, dst: tcp/192.168.1.209:62679, mtu: 65535, is_reliable: true, is_streamed: true }, config: TransportLinkUnicastConfig { direction: Inbound, batch: BatchConfig { mtu: 65535, is_streamed: true, is_compression: false } } }.
[2024-03-04T20:04:22Z DEBUG zenoh_transport::unicast::universal::link] tcp/192.168.1.220:7447 => tcp/192.168.1.209:62679:BatchConfig { mtu: 65535, is_streamed: true, is_compression: false }: expired after 10000 milliseconds at io/zenoh-transport/src/unicast/universal/link.rs:290.
[2024-03-04T20:04:22Z DEBUG zenoh_transport::unicast::universal::transport] [d91b1345aea9fed22245b7807e2e3c94] Closing transport with peer: 82c874915592a685885513c05104fd95
[2024-03-04T20:04:22Z DEBUG zenoh::net::routing::router] Close Face{17, 82c874915592a685885513c05104fd95}

@fuzzypixelz
Copy link
Member

@meganukebmp This might be a memory usage issue. Could you please try setting Z_BATCH_UNICAST_SIZE and Z_FRAG_MAX_SIZE to smaller values? See this for an example.

@oteffahi
Copy link
Contributor

oteffahi commented Mar 5, 2024

@meganukebmp This might be a memory usage issue. Could you please try setting Z_BATCH_UNICAST_SIZE and Z_FRAG_MAX_SIZE to smaller values? See this for an example.

Using different sizes for Z_BATCH_UNICAST_SIZE and Z_FRAG_MAX_SIZE with z_pub example works 25% of the time as opposed to never working with default parameters.

Further investigation is required.

@meganukebmp
Copy link
Author

meganukebmp commented Mar 5, 2024

Update on this. I got it to work. This seemed to be a compound issue of multiple things.

Firstly, changing memories didn't initially work because I had not configured the zenohd router to also use these values. I'm not sure why this would cause an issue in the connection stage. The session open packet surely isn't big enough to run the ESP32 out of memory. Regardless changing that value made it work only sometimes, on TCP. I had assumed since both TCP and UDP exhibited the same behavior for all my tests until now, they'd be reacting the same way. However the memory changes on the zenohd router had fixed TCP. I eventually found that out. To which I still had to investigate why UDP was failing. The answer is the socket timeout Z_CONFIG_SOCKET_TIMEOUT. The default value of 100ms is simply not enough to establish the session it seems, at least in all the multiple network conditions I tested. Could be that the ESP32 simply is too slow. Either way, increasing this value to a ridiculous 5000ms made it reliably establish connections.

Ultimately this was a user error, and I apologize for the time wasted here. I guess the main takeaway is that this should be written down somewhere very explicitly, or maybe the default values should be adjusted to be more conservative (would this break support with zenohd defaults?)

To recap.
TCP & UDP connections were failing because:

  1. The ESP was running out of memory. This was solved by changing Z_BATCH_UNICAST_SIZE, Z_BATCH_MULTICAST_SIZE & Z_FRAG_MAX_SIZE all to a conservative 1024 bytes in my case.
  2. UDP sessions were failing because unlike TCP which would wait quite some time (whatever the ESP32 TCP timeout is) it would use the zenoh provided Z_CONFIG_SOCKET_TIMEOUT value of 100ms which would timeout almost instantly.

For reference my cflags were -DZ_BATCH_UNICAST_SIZE=1024 -DZ_BATCH_MULTICAST_SIZE=1024 -DZ_FRAG_MAX_SIZE=1024 -DZ_CONFIG_SOCKET_TIMEOUT=5000

Thanks for all the pointers.

I will leave the maintainers to close this issue in case there is more to discuss.

@vortex314
Copy link
Contributor

Your diagnostic on the config made me think about the serial issue. So I tweaked another option : transport.unicast.max_links in zenohd config
I changed this from 1 to 10 and it seems to address the other issue eclipse-zenoh/zenoh#775.
Thanks.

@jean-roland
Copy link
Contributor

Thanks a lot both for figuring these out, even if it's a config only issue it's still important for us to know what options are needed for each setup.

@milyin milyin closed this as completed Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants