Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossip doesn't replace PKI-ID of a peer that has renewed an expired certificate #5111

Open
denyeart opened this issue Jan 20, 2025 · 2 comments · May be fixed by #5110
Open

Gossip doesn't replace PKI-ID of a peer that has renewed an expired certificate #5111

denyeart opened this issue Jan 20, 2025 · 2 comments · May be fixed by #5110
Labels

Comments

@denyeart
Copy link
Contributor

denyeart commented Jan 20, 2025

Description

When a peer gets a renewed enrollment cert before it expires, gossip on other peers recognizes the change and replaces the in-memory PKI-ID, like this:

INFO [gossip.comm] createConnection -> Peer 127.0.0.1:22510 changed its PKI-ID from 3257ef1cb73517a1c07b06c0c8f06903cb537199ffb11ecd23643b900f3f43d2 to 9f3e40a456e9f1a22c73e1206bd78b3b1b0ed7276ffbb37b9de00d23530f0877
INFO [gossip.discovery] purge -> Purging 3257ef1cb73517a1c07b06c0c8f06903cb537199ffb11ecd23643b900f3f43d2 from membership

However, if an enrollment cert expires before gossip on other peers sees the renewal, gossip is left in a bad state, like this:

WARN [gossip.gossip] func3 -> Unable to determine org of message tag:EMPTY  alive_msg:{membership:{endpoint:"127.0.0.1:22510"  pki_id:"\x9f>@\xa4V\xe9\xf1\xa2,s\xe1 k׋;\x1b\x0e\xd7'o\xfb\xb3{\x9d\xe0\r#S\x0f\x08w"}  timestamp:{inc_num:1737346511720016000  seq_num:29}}
WARN [gossip.gossip] disclosurePolicy -> Cannot determine organization of Endpoint: 127.0.0.1:22510, InternalEndpoint: , PKI-ID: 9f3e40a456e9f1a22c73e1206bd78b3b1b0ed7276ffbb37b9de00d23530f0877, Metadata: 
ERRO [peer.gossip.sa] OrgByPeerIdentity -> Invalid Peer Identity. It must be different from nil.
WARN [gossip.comm] createConnection -> Remote endpoint claims to be a different peer, expected 9f3e40a456e9f1a22c73e1206bd78b3b1b0ed7276ffbb37b9de00d23530f0877 but got b4944bc458640ad309569bc34eb58d66ad6a67e642f2ee89edead2e2fabd7cd8

The result is that gossip membership bounces between established and not established until the other peers are restarted.

Steps to reproduce

See integration test to reproduce #5110

@denyeart denyeart added the bug label Jan 20, 2025
@denyeart denyeart linked a pull request Jan 20, 2025 that will close this issue
@yacovm
Copy link
Contributor

yacovm commented Jan 22, 2025

The result is that gossip membership bounces between established and not established until the other peers are restarted.

I think this happens because the second peer in your test connects to the first peer after its certificate renewal and establishes a connection.

However, the first peer remembers the second peer's old PKI-ID as offline, and it tries to reconnect to it. Once it happens, it bails on the connection after the handshake, but meanwhile the second peer replaces its previous connection with the new connection from the first peer, which is now... closed.

@yacovm
Copy link
Contributor

yacovm commented Jan 22, 2025

I commented on your PR what is the core issue and how to fix. Feel free to push a PR and I'll review it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants