Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AQC107 fails to resume from suspend #62

Open
johndoe31415 opened this issue Jul 22, 2024 · 1 comment
Open

AQC107 fails to resume from suspend #62

johndoe31415 opened this issue Jul 22, 2024 · 1 comment

Comments

@johndoe31415
Copy link

Hello there!

I'm using an AQC107 NIC:

01:00.0 Ethernet controller [0200]: Aquantia Corp. AQtion AQC107 NBase-T/IEEE 802.3an Ethernet Controller [Atlantic 10G] [1d6a:07b1] (rev 02)

on Linux x86_64 running a standard Ubuntu (2024.04 noble) stock kernel, untainted:

reliant joe [~]: uname -a
Linux reliant 6.5.0-28-generic #29-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 28 23:46:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

reliant joe [~]: cat /proc/sys/kernel/tainted                                  
0

I'm experiencing sporadic NIC failures when waking from suspend-to-RAM. Concretely, when it happens (maybe on every 3rd suspend operation, so reasonably/annoyingly often), the network driver completely locks up and no connectivity is possible. Sometimes I'm able to recover by rmmod and modprobe, but in 90% of the cases this also is not possible and I have to reboot to get the NIC working again. Note also that the system will go into shutdown but systemd then hangs somewhere, needing me to issue a hard reset.

When it occurs, I see the following in dmesg:

[33284.397291] kworker/u256:70: page allocation failure: order:5, mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[33284.397303] CPU: 17 PID: 4159607 Comm: kworker/u256:70 Not tainted 6.5.0-28-generic #29-Ubuntu
[33284.397306] Hardware name: LENOVO 30E0003QGE/1046, BIOS S07KT4AA 07/22/2022
[33284.397308] Workqueue: events_unbound async_run_entry_fn
[33284.397313] Call Trace:
[33284.397315]  <TASK>
[33284.397319]  dump_stack_lvl+0x48/0x70
[33284.397323]  dump_stack+0x10/0x20
[33284.397325]  warn_alloc+0x174/0x1f0
[33284.397329]  ? __alloc_pages_direct_compact+0xb7/0x240
[33284.397334]  __alloc_pages_slowpath.constprop.0+0x8f1/0x980
[33284.397339]  __alloc_pages+0x31f/0x350
[33284.397344]  ? aq_ring_alloc+0x27/0x90 [atlantic]
[33284.397359]  __kmalloc_large_node+0x7a/0x150
[33284.397362]  ? iommu_dma_alloc+0x16e/0x1e0
[33284.397366]  __kmalloc+0xdb/0x170
[33284.397370]  aq_ring_alloc+0x27/0x90 [atlantic]
[33284.397383]  aq_ring_rx_alloc+0x97/0xb0 [atlantic]
[33284.397396]  aq_vec_ring_alloc+0xbe/0x290 [atlantic]
[33284.397409]  ? hw_atl_b0_hw_ring_rx_fill+0x5d/0x70 [atlantic]
[33284.397424]  aq_nic_init+0x13d/0x240 [atlantic]
[33284.397439]  atl_resume_common+0x46/0xf0 [atlantic]
[33284.397452]  aq_pm_resume_restore+0xe/0x20 [atlantic]
[33284.397465]  pci_pm_resume+0x75/0x110
[33284.397468]  ? __pfx_pci_pm_resume+0x10/0x10
[33284.397471]  dpm_run_callback+0x54/0x1b0
[33284.397475]  device_resume+0xad/0x220
[33284.397478]  async_resume+0x1f/0x90
[33284.397480]  async_run_entry_fn+0x33/0x130
[33284.397483]  process_one_work+0x223/0x440
[33284.397487]  worker_thread+0x4d/0x3f0
[33284.397490]  ? __pfx_worker_thread+0x10/0x10
[33284.397492]  kthread+0xf2/0x120
[33284.397495]  ? __pfx_kthread+0x10/0x10
[33284.397498]  ret_from_fork+0x47/0x70
[33284.397501]  ? __pfx_kthread+0x10/0x10
[33284.397504]  ret_from_fork_asm+0x1b/0x30
[33284.397510]  </TASK>
[33284.397511] Mem-Info:
[33284.397513] active_anon:1394551 inactive_anon:357689 isolated_anon:0
                active_file:1793872 inactive_file:9717634 isolated_file:0
                unevictable:104 dirty:42 writeback:0
                slab_reclaimable:1842731 slab_unreclaimable:234794
                mapped:327615 shmem:140848 pagetables:23776
                sec_pagetables:0 bounce:0
                kernel_misc_reclaimable:0
                free:292280 free_pcp:0 free_cma:0
[33284.397518] Node 0 active_anon:5578204kB inactive_anon:1430756kB active_file:7175488kB inactive_file:38870536kB unevictable:416kB isolated(anon):0kB isolated(file):0kB mapped:1310460kB dirty:168kB writeback:0kB shmem:563392kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:46704kB pagetables:95104kB sec_pagetables:0kB all_unreclaimable? no
[33284.397523] Node 0 DMA free:11264kB boost:0kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[33284.397527] lowmem_reserve[]: 0 2858 64098 64098 64098
[33284.397532] Node 0 DMA32 free:262836kB boost:12876kB min:15796kB low:18628kB high:21460kB reserved_highatomic:2048KB active_anon:21740kB inactive_anon:3736kB active_file:308kB inactive_file:1991556kB unevictable:0kB writepending:0kB present:2992764kB managed:2926724kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[33284.397537] lowmem_reserve[]: 0 0 61240 61240 61240
[33284.397541] Node 0 Normal free:895020kB boost:285064kB min:349708kB low:412408kB high:475108kB reserved_highatomic:2048KB active_anon:5556464kB inactive_anon:1427020kB active_file:7175180kB inactive_file:36878980kB unevictable:416kB writepending:168kB present:63949824kB managed:62710260kB mlocked:416kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[33284.397546] lowmem_reserve[]: 0 0 0 0 0
[33284.397551] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
[33284.397564] Node 0 DMA32: 4253*4kB (UME) 2490*8kB (UME) 1727*16kB (UMEH) 2056*32kB (UMEH) 1180*64kB (UMEH) 443*128kB (UMEH) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 262836kB
[33284.397581] Node 0 Normal: 130947*4kB (UMEH) 44782*8kB (UMEH) 121*16kB (UMEH) 345*32kB (UME) 4*64kB (ME) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 895276kB
[33284.397596] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[33284.397597] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[33284.397598] 11652354 total pagecache pages
[33284.397599] 0 pages in swap cache
[33284.397600] Free swap  = 0kB
[33284.397601] Total swap = 0kB
[33284.397602] 16739646 pages RAM
[33284.397603] 0 pages HighMem/MovableOnly
[33284.397603] 326560 pages reserved
[33284.397604] 0 pages hwpoisoned
[33284.403493] atlantic 0000:01:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0x110 returns -12
[33284.403497] atlantic 0000:01:00.0: PM: failed to resume async: error -12

Any advice on how I can support debugging this issue is greatly appreciated. Thanks!

@tarkh
Copy link

tarkh commented Dec 12, 2024

Same here with Ethernet controller: Aquantia Corp. AQtion AQC107 NBase-T/IEEE 802.3an Ethernet Controller [Atlantic 10G] (rev 02). @johndoe31415 did you find any solution? Thanx!

UPD:
As a temporary workaround I've created sleep/wake systemd script to trigger AQC107 pci card and atlantic kernel module reset. So basically if you unload kernel module before sleep, then you can wake without issues and KP, then you can modprobe module back and reset pci device. It seems that the atlantic module doesn't reinitialize the NIC properly after sleep...

Workaround:

  1. Create script file that will run on sleep and wake
    sudo nano /usr/lib/systemd/system-sleep/atlantic-pci-reset

  2. Add script content, replace DEVICE (check lspci)

#!/bin/bash
DEVICE="0000:02:00.0" # Replace with your actual PCI device ID
DRIVER="atlantic"

case $1 in
    pre)
        # Before sleep: Unload the driver
        modprobe -r $DRIVER
        ;;
    post)
        # After wake: Reload the driver and reset the device
        modprobe $DRIVER
        echo 1 > /sys/bus/pci/devices/$DEVICE/remove
        echo 1 > /sys/bus/pci/rescan
        ;;
esac
  1. Make script executable
    sudo chmod +x /usr/lib/systemd/system-sleep/atlantic-pci-reset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants