Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pstore doesn't persist panics correctly #362

Open
fionera opened this issue Dec 15, 2024 · 2 comments
Open

pstore doesn't persist panics correctly #362

fionera opened this issue Dec 15, 2024 · 2 comments
Labels
bug Something isn't working c/node Issues related to low-level node services and startup

Comments

@fionera
Copy link
Contributor

fionera commented Dec 15, 2024

While hacking around I found out that our pstore implementation doesn't correctly persist panics.

Metropolis: this is /dev/ttyS0. Verbose node logs follow.

        panichandler I Panic console: /dev/ttyS0
        panichandler I Panic console: /dev/ttyS1
                init I Starting Metropolis node init
                init I Version: v0.1.0-dev880.gc441f0b4.dirty
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2548be7]

goroutine 1 [running]:
main.main()
        metropolis/node/core/main.go:280 +0xc67

 Metropolis encountered an uncorrectable error and this node must be restarted.
core exit status: 2
  Disks synced, rebooting...

[    4.116511] reboot: Restarting system
=
BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x4,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x4,0x0)
[TRACE]: external/rules_rust~~crate~crate_index_efi__uefi-0.24.0/src/fs/file_system/fs.rs@327: Can't open file \EFI\metropolis\loader_state.pb: Error { status: NOT_FOUND, data: () }
Unable to load A/B loader state, using default slot A: while reading state file: IO error
Booting into Slot A

  Metropolis Cluster Operating System
  Copyright 2020-2024 The Monogon Project Authors


Metropolis: this is /dev/ttyS0. Verbose node logs follow.

        panichandler I Panic console: /dev/ttyS0
        panichandler I Panic console: /dev/ttyS1
                init I Starting Metropolis node init
                init I Version: v0.1.0-dev880.gc441f0b4.dirty
              pstore W �

@fionera
Copy link
Contributor Author

fionera commented Dec 15, 2024

With some more digging, it is already corrupted inside the EFI vars.

                init I reading "dump-type7-0-0-1734294273-D" in "cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0"
                init I "\xcc": cc

@fionera fionera added bug Something isn't working c/node Issues related to low-level node services and startup labels Dec 16, 2024
@lorenz
Copy link
Contributor

lorenz commented Dec 16, 2024

Weird, any idea why this is broken? Our code looks pretty bulletproof, maybe a kernel bug?

Anyways, we should probably move this into package and add a test if we want to make sure this keeps working as it is a pain to debug currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working c/node Issues related to low-level node services and startup
Projects
None yet
Development

No branches or pull requests

2 participants