Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreliable network connection on Kasli v2.0.2 and ARTIQ-7 #2286

Open
kaolpr opened this issue Nov 20, 2023 · 6 comments
Open

Unreliable network connection on Kasli v2.0.2 and ARTIQ-7 #2286

kaolpr opened this issue Nov 20, 2023 · 6 comments
Assignees

Comments

@kaolpr
Copy link
Contributor

kaolpr commented Nov 20, 2023

Bug Report

One-Line Summary

With ARTIQ-7 (release-7) branch Ethernet connection to Kasli v2.0.2 mostly fails, however with ARTIQ-8 (master) it works at all times.

Issue Details

Steps to Reproduce

  1. Build Kasli firmware with ARTIQ-7.8185.cc81464
  2. Flash Kasli with firmware and configure with storage file to predefined IP (only IP is written in the storage area file)
  3. Observe ping output
  4. Power cycle several times

Expected Behavior

Kasli responds to ping at all times.

Actual (undesired) Behavior

  • Kasli sometimes responds to ping (for larger number of power cycles it is < 30%).
  • Serial output:
 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017-2022 M-Labs Limited

Bootloader CRC passed
Gateware ident 7.8185.cc81464;test
Initializing SDRAM...
Read leveling scan:
Module 1:
00000000000001111111110000000000
Module 0:
00000000000001111111111000000000
Read leveling: 17+-4 17+-5 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000016s]  INFO(runtime): ARTIQ runtime starting...
[     0.003935s]  INFO(runtime): software ident 7.8185.cc81464;test
[     0.009864s]  INFO(runtime): gateware ident 7.8185.cc81464;test
[     0.015801s]  INFO(runtime): log level set to INFO by default
[     0.021531s]  INFO(runtime): UART log level set to INFO by default
[     0.179750s]  WARN(runtime::rtio_clocking): rtio_clock setting not recognised. Falling back to default.
[     0.187853s]  INFO(runtime::rtio_clocking): using internal 125MHz RTIO clock
[     0.464337s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
[     8.341259s]  INFO(board_artiq::si5324):   ...locked
[     8.371941s]  INFO(runtime): network addresses: MAC=fc-0f-e7-07-33-ce IPv4=192.168.1.70 IPv6-LL=fe80::fe0f:e7ff:fe07:33ce IPv6=no configured address
[     8.385637s]  INFO(board_artiq::drtio_routing): could not read routing table from configuration, using default
[     8.394351s]  INFO(board_artiq::drtio_routing): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 2 0; 3: 3 0; }
[     8.407962s]  INFO(runtime::mgmt): management interface active
[     8.420022s]  INFO(runtime::session): accepting network sessions
[     8.433112s]  INFO(runtime::session): running startup kernel
[     8.437568s]  INFO(runtime::session): no startup kernel found
[     8.443382s]  INFO(runtime::session): no connection, starting idle kernel
[     8.450215s]  INFO(runtime::session): no idle kernel found
[     8.455639s]  INFO(runtime::rtio_mgt::drtio): [DEST#0] destination is up
  • Link status LED on connected switch lights up every time, however status LED on Kasli mostly remains off. Sometimes it lights up, however it does not necessarily mean that Kasli will respond to ping (it happened that it did not respond to ping with status LED on).
  • The same hardware setup works perfectly fine with ARTIQ-8.8573+b168f0b.beta, tested with 100 power cycles, responds to ping every time.
  • With power cycle I mean: power on, wait for device to boot (15s), ping 3 times (ping -c 4 -B -W 30), note result, power off, wait 30s.

Your System (omit irrelevant parts)

  • Operating System: Ubuntu 22.04
  • ARTIQ version: ARTIQ v7.8185.cc81464 / master
  • Hardware involved:
    • Kasli v2.0.2
    • set of peripheral boards (Urukuls, Mirnys, DIO)
@kaolpr
Copy link
Contributor Author

kaolpr commented Nov 22, 2023

I've changed base to standalone. Out of 30 power cycles, 4 failed.

@marmeladapk
Copy link
Contributor

It may, or may not be the same error that I faced several times, it seemed like some part of Ethernet chain required an additional "reset".

In my case I ran continuous ping on Kasli's address and power cycled the board. Usually (>~90%) Kasli would not respond to pings until I run artiq_flash start or disconnected and reconnected ethernet cable. Most recent case was in master configuration, release-7, however I also encountered this in standalone configurations.

It also seemed to be device specific, as in my case the same gateware flashed to another Kasli worked fine.

@thomasfire
Copy link
Contributor

I'm able to reproduce it as well. Additional details: it happens only on power reset, simple restart with artiq_flash start typically reconnects fine. Also the LED doesn't work when this happens. I'll investigate further.

@thomasfire
Copy link
Contributor

After long bisecting the fix seems to be in the 0a37a1a (rtio clock changes), but I am not sure such long commit should be backported to the release-7. I made a cherry-pick though: thomasfire@07ca8c7

@sbourdeauducq
Copy link
Member

I am not sure such long commit should be backported to the release-7

Obviously not 🙄

@Spaqin
Copy link
Collaborator

Spaqin commented Jan 22, 2025

Interesting that seemingly unrelated change in the gateware would help with network stability; could that be a Vivado routing/placement issue that would be covered by syncrtio?

And yes, it's a core change for ARTIQ-8 that allowed distributed DMA and subkernels, and it should not be a part of ARTIQ-7. The consequences... well for the end user there shouldn't be that many, but I'm not sure; have you tested it, including RTIO?

Generally unless the network issues make working impossible I would still avoid switching.

The cherrypick should also include these two fixes:
a533f2a
b896826

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants