Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

24.7 - Kernel panic after a while running the firewall #230

Open
2 tasks done
wofwofwof opened this issue Dec 4, 2024 · 9 comments
Open
2 tasks done

24.7 - Kernel panic after a while running the firewall #230

wofwofwof opened this issue Dec 4, 2024 · 9 comments
Assignees
Labels
upstream Third party issue

Comments

@wofwofwof
Copy link

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

After updating opensense from 24.7_9 to 24.7.10_1 I get strange problems with our PC Engines APU2 board. I use two boards with HA. After the upgrade everythings seems to work fine, but after a while I got kernel panics.

To Reproduce

I don't have a method to trigger the kernel panics other than waiting a while.

Expected behavior

No kernel panics.

Describe alternatives you considered

After reverting the kernel back to 24.7 (opnsense-update -kr 24.7) it seems to work again without problems.

Relevant log files
Attached crash files.
crashes.zip

Here is one example:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x822330000
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8109fa60
stack pointer           = 0x28:0xfffffe008be22770
frame pointer           = 0x28:0xfffffe008be22770
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 14829 (openvpn)
rdi: 0000000822330000 rsi: fffffe008be22818 rdx: 0000000000000028
rcx: 0000000000003561  r8: 00000000000000c0  r9: 000000000331a8c0
rax: 0000000000000000 rbx: fffff80034e3a000 rbp: fffffe008be22770
r10: 00000000f1b10b25 r11: fffff80034e3a520 r12: fffffe008be22818
r13: 0000000822330000 r14: fffff80021732300 r15: fffffe006aea0000
trap number             = 12
panic: page fault
cpuid = 1
time = 1733247180
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe008be22460
vpanic() at vpanic+0x131/frame 0xfffffe008be22590
panic() at panic+0x43/frame 0xfffffe008be225f0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe008be22650
trap_pfault() at trap_pfault+0x46/frame 0xfffffe008be226a0
calltrap() at calltrap+0x8/frame 0xfffffe008be226a0
--- trap 0xc, rip = 0xffffffff8109fa60, rsp = 0xfffffe008be22770, rbp = 0xfffffe008be22770 ---
memcmp() at memcmp+0x110/frame 0xfffffe008be22770
pf_find_state() at pf_find_state+0xc0/frame 0xfffffe008be227c0
pf_test_state_tcp() at pf_test_state_tcp+0x1c4/frame 0xfffffe008be22930
pf_test() at pf_test+0x131e/frame 0xfffffe008be22ae0
pf_check_in() at pf_check_in+0x27/frame 0xfffffe008be22b00
pfil_mbuf_in() at pfil_mbuf_in+0x38/frame 0xfffffe008be22b30
ip_tryforward() at ip_tryforward+0x17f/frame 0xfffffe008be22bf0
ip_input() at ip_input+0x56c/frame 0xfffffe008be22c50
netisr_dispatch_src() at netisr_dispatch_src+0x9e/frame 0xfffffe008be22ca0
tunwrite() at tunwrite+0x2e4/frame 0xfffffe008be22d10
devfs_write_f() at devfs_write_f+0xda/frame 0xfffffe008be22d70
dofilewrite() at dofilewrite+0x7f/frame 0xfffffe008be22dc0
sys_writev() at sys_writev+0x64/frame 0xfffffe008be22e00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe008be22f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe008be22f30
--- syscall (121, FreeBSD ELF64, writev), rip = 0x8265826fa, rsp = 0x82048c3b8, rbp = 0x82048c3e0 ---
KDB: enter: panic

Environment

Software version: OPNsense 24.7.10_1, same with 24.7.10_2 (14.1-RELEASE-p6 FreeBSD 14.1-RELEASE-p6 stable/24.7-n267979-0d692990122 SMP amd64)

Hardware: PC Engines APU2 (https://pcengines.ch/apu2.htm)

@fichtner
Copy link
Member

fichtner commented Dec 4, 2024

Saw that one on the forum just now, 0d69299 definitely a bad kernel.

Can you try with the instructions here?

https://forum.opnsense.org/index.php?topic=44413.msg221775#msg221775

Please note we had to hotfix the kernel which will not reinstall automatically if you caught the bad version. If you experience panics on 24.7.10 relating to pf(4) please reinstall from the GUI (which includes an automatic reboot) or run "opnsense-update -fk" from the shell followed by a manual reboot. The correct kernel identifies itself as "stable/24.7-n267981-8375762712f" using "uname -v".

@fichtner
Copy link
Member

fichtner commented Dec 4, 2024

@wofwofwof
Copy link
Author

Thank you very much for your superfast help, I've upgraded the kernel to "stable/24.7-n267981-8375762712f" and will check if everything works now. I will close the ticket if no problem happens till Friday.

Thanks for your great work!

@fichtner
Copy link
Member

fichtner commented Dec 4, 2024

@wofwofwof no problem, appreciate the report

@fichtner fichtner added the upstream Third party issue label Dec 4, 2024
@fichtner fichtner self-assigned this Dec 4, 2024
@fichtner
Copy link
Member

fichtner commented Dec 4, 2024

Link to upstream review. https://reviews.freebsd.org/D47495

Expecting a vmcore to push a report to https://bugs.freebsd.org

@fichtner
Copy link
Member

fichtner commented Dec 4, 2024

This isn't the memcpy one, but I need to document the other too:

(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=0) at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff8049c36a in db_dump (dummy=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) at /usr/src/sys/ddb/db_command.c:591
#3  0xffffffff8049c16d in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=false) at /usr/src/sys/ddb/db_command.c:504
#4  0xffffffff8049c2b6 in db_command_script (command=command@entry=0xffffffff81bbf6d3 <db_recursion_data+3> "dump") at /usr/src/sys/ddb/db_command.c:569
#5  0xffffffff804a1528 in db_script_exec (scriptname=<optimized out>, warnifnotfound=warnifnotfound@entry=0) at /usr/src/sys/ddb/db_script.c:302
#6  0xffffffff804a1435 in db_script_kdbenter (eventname=<optimized out>) at /usr/src/sys/ddb/db_script.c:325
#7  0xffffffff8049f4f1 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:267
#8  0xffffffff80c09868 in kdb_trap (type=type@entry=3, code=code@entry=0, tf=tf@entry=0xfffffe00e206e2e0) at /usr/src/sys/kern/subr_kdb.c:790
#9  0xffffffff810e0419 in trap (frame=0xfffffe00e206e2e0) at /usr/src/sys/amd64/amd64/trap.c:608
#10 <signal handler called>
#11 kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:556
#12 0xffffffff80bb91d2 in vpanic (fmt=0xffffffff823f5cbd "Bad link elm %p prev->next != elm", ap=ap@entry=0xfffffe00e206e510) at /usr/src/sys/kern/kern_shutdown.c:955
#13 0xffffffff80bb9283 in panic (fmt=0xffffffff81d82c18 <cnputs_mtx+24> "") at /usr/src/sys/kern/kern_shutdown.c:891
#14 0xffffffff823c1dd0 in pf_state_key_detach (s=s@entry=0xfffff803cc297b00, idx=idx@entry=0) at /usr/src/sys/netpfil/pf/pf.c:1456
#15 0xffffffff823ad0ef in pf_detach_state (s=s@entry=0xfffff803cc297b00) at /usr/src/sys/netpfil/pf/pf.c:1442
#16 0xffffffff823ac6d9 in pf_state_key_attach (skw=0xfffff803cc2c4420, sks=0x0, s=0xfffff803cc297b00) at /usr/src/sys/netpfil/pf/pf.c:1355
#17 pf_state_insert (kif=<optimized out>, orig_kif=orig_kif@entry=0xfffff80002150600, skw=0xfffff803cc2c4420, sks=<optimized out>, s=s@entry=0xfffff803cc297b00)
    at /usr/src/sys/netpfil/pf/pf.c:1535
#18 0xffffffff823ba740 in pf_create_state (r=0xfffff80227b7e000, nr=0xfffff80189e7a800, a=<optimized out>, pd=0xfffffe00e206eb00, nsn=0x0, nk=0x12, sk=<optimized out>, 
    m=0xfffff8001dc64800, off=20, sport=4843, dport=59668, rewrite=0xfffffe00e206ea0c, kif=0xfffff80002150600, sm=0xfffffe00e206ec18, tag=-1, bproto_sum=25520, 
    bip_sum=979, hdrlen=8, match_rules=<optimized out>) at /usr/src/sys/netpfil/pf/pf.c:5025
#19 pf_test_rule (rm=rm@entry=0xfffffe00e206ebf0, sm=sm@entry=0xfffffe00e206ec18, kif=kif@entry=0xfffff80002150600, m=0xfffff8001dc64800, off=20, 
    pd=pd@entry=0xfffffe00e206eb00, am=0xfffffe00e206ebd8, rsm=0xfffffe00e206ebc8, inp=0x0) at /usr/src/sys/netpfil/pf/pf.c:4800
#20 0xffffffff823b4471 in pf_test (dir=dir@entry=1, pflags=<optimized out>, ifp=0xfffff80001906000, m0=m0@entry=0xfffffe00e206ed08, inp=<optimized out>, 
    default_actions=default_actions@entry=0x0) at /usr/src/sys/netpfil/pf/pf.c:8269
#21 0xffffffff823dc177 in pf_check_in (m=0xfffffe00e206ed08, ifp=0x12, flags=-502865312, ruleset=<optimized out>, inp=0xffffffff80c10af0 <putchar>)
    at /usr/src/sys/netpfil/pf/pf_ioctl.c:6575
#22 0xffffffff80d19e98 in pfil_mbuf_common (pch=<optimized out>, m=0xfffffe00e206ed08, m@entry=0xfffffe00e206ec48, ifp=0xfffff80001906000, flags=65536, inp=inp@entry=0x0)
    at /usr/src/sys/net/pfil.c:212
#23 pfil_mbuf_in (head=<optimized out>, m=m@entry=0xfffffe00e206ed08, ifp=0xfffff80001906000, inp=inp@entry=0x0) at /usr/src/sys/net/pfil.c:230
#24 0xffffffff80d9c59a in ip_tryforward (m=0xfffff8001dc64800) at /usr/src/sys/netinet/ip_fastfwd.c:312
#25 0xffffffff80d9fa9c in ip_input (m=0xfffff8001dc64800) at /usr/src/sys/netinet/ip_input.c:621
#26 0xffffffff80d1682b in netisr_process_workstream_proto (nwsp=0xfffffe003a5eca40, proto=1) at /usr/src/sys/net/netisr.c:927
#27 swi_net (arg=0xfffffe003a5eca40) at /usr/src/sys/net/netisr.c:974
#28 0xffffffff80b6ffc6 in intr_event_execute_handlers (ie=0xfffff80001a59100, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1205
#29 ithread_execute_handlers (ie=0xfffff80001a59100, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1218
#30 ithread_loop (arg=arg@entry=0xfffff80001a7a620) at /usr/src/sys/kern/kern_intr.c:1306
#31 0xffffffff80b6c402 in fork_exit (callout=0xffffffff80b6fd70 <ithread_loop>, arg=0xfffff80001a7a620, frame=0xfffffe00e206ef40) at /usr/src/sys/kern/kern_fork.c:1164
#32 <signal handler called>
(kgdb) frame 14
#14 0xffffffff823c1dd0 in pf_state_key_detach (s=s@entry=0xfffff803cc297b00, idx=idx@entry=0) at /usr/src/sys/netpfil/pf/pf.c:1456
warning: Source file is more recent than executable.
1456		TAILQ_REMOVE(&sk->states[idx], s, key_list[idx]);
(kgdb) list
1451	#ifdef INVARIANTS
1452		struct pf_keyhash *kh = &V_pf_keyhash[pf_hashkey(sk)];
1453	
1454		PF_HASHROW_ASSERT(kh);
1455	#endif
1456		TAILQ_REMOVE(&sk->states[idx], s, key_list[idx]);
1457		s->key[idx] = NULL;
1458	
1459		if (TAILQ_EMPTY(&sk->states[0]) && TAILQ_EMPTY(&sk->states[1])) {
1460			LIST_REMOVE(sk, entry);
(kgdb) p *sk
$1 = {addr = {{{v4 = {s_addr = XXX}, v6 = {__u6_addr = {__u6_addr8 = "XXX", <incomplete sequence XXX>, 
            __u6_addr16 = {XXX, XXX, XXX, XXX, XXX, XXX, XXX, XXX}, __u6_addr32 = {XXX, XXX, XXX, XXX}}}, 
        addr8 = "XXX", <incomplete sequence \XXX>, addr16 = {XXX, XXX, XXX, XXX, XXX, XXX, XXX, 
          XXX}, addr32 = {XXX, XXX, XXX, XXX}}}, {{v4 = {s_addr = XXX}, v6 = {__u6_addr = {
            __u6_addr8 = "XXX", <incomplete sequence XXX>, __u6_addr16 = {XXX, XXX, XXX, XXX, XXX, 
              XXX, XXX, XXX}, __u6_addr32 = {XXX, XXX, XXX, XXX}}}, 
        addr8 = "XXX", <incomplete sequence XXX>, addr16 = {XXX, XXX, XXX, XXX, XXX, XXX, XXX, 
          XXX}, addr32 = {XXX, XXX, XXX, XXX}}}}, port = {49374, 57005}, af = 222 '\336', proto = 192 '\300', 
  pad = "\255", <incomplete sequence \336>, entry = {le_next = 0xdeadc0dedeadc0de, le_prev = 0xdeadc0dedeadc0de}, states = {{tqh_first = 0xdeadc0dedeadc0de, 
      tqh_last = 0xdeadc0dedeadc0de}, {tqh_first = 0xdeadc0dedeadc0de, tqh_last = 0xdeadc0dedeadc0de}}}
(kgdb) p *sk->states
$2 = {tqh_first = 0xdeadc0dedeadc0de, tqh_last = 0xdeadc0dedeadc0de}
(kgdb) p *s
$3 = {id = 10415225491559546880, creatorid = 1082503010, direction = 1 '\001', pad = "\000\000", state_flags = 128, timeout = 27 '\033', sync_state = 255 '\377', 
  sync_updates = 0 '\000', refs = 0, lock = 0xfffffe0109794688, sync_list = {tqe_next = 0x0, tqe_prev = 0x0}, key_list = {{tqe_next = 0x0, 
      tqe_prev = 0xfffff803cc2c4458}, {tqe_next = 0x0, tqe_prev = 0x0}}, entry = {le_next = 0x0, le_prev = 0x0}, src = {scrub = 0x0, seqlo = 0, seqhi = 0, seqdiff = 0, 
    max_win = 0, mss = 0, state = 1 '\001', wscale = 0 '\000', tcp_est = 0 '\000', pad = ""}, dst = {scrub = 0x0, seqlo = 0, seqhi = 0, seqdiff = 0, max_win = 0, 
    mss = 0, state = 0 '\000', wscale = 0 '\000', tcp_est = 0 '\000', pad = ""}, match_rules = {slh_first = 0x0}, rule = {ptr = 0xfffff80227b7e000, nr = 666361856}, 
  anchor = {ptr = 0x0, nr = 0}, nat_rule = {ptr = 0xfffff80189e7a800, nr = 2313660416}, rt_addr = {{v4 = {s_addr = 0}, v6 = {__u6_addr = {
          __u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, addr8 = '\000' <repeats 15 times>, addr16 = {0, 
        0, 0, 0, 0, 0, 0, 0}, addr32 = {0, 0, 0, 0}}}, key = {0xfffff803cc2c4420, 0x0}, kif = 0xfffff80002150600, orig_kif = 0xfffff80002150600, rt_kif = 0x0, 
  src_node = 0x0, nat_src_node = 0x0, packets = {0, 0}, bytes = {0, 0}, creation = 127, expire = 127, pfsync_time = 0, act = {rtableid = -1, qid = 0, pqid = 0, 
    max_mss = 0, log = 0 '\000', set_tos = 0 '\000', min_ttl = 0 '\000', dnpipe = 0, dnrpipe = 0, flags = 128, set_prio = "\000"}, tag = 0, rt = 0 '\000'}

@fichtner
Copy link
Member

fichtner commented Dec 5, 2024

@wofwofwof
Copy link
Author

All things work now again with the current kernel. No issues since the upgrade. Thanks a lot.

@fichtner
Copy link
Member

fichtner commented Dec 6, 2024

@wofwofwof nice to hear :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Third party issue
Development

No branches or pull requests

2 participants