Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql,pg: catch db backend failure #1079

Merged
merged 1 commit into from
Nov 18, 2024
Merged

Conversation

jchappelow
Copy link
Member

@jchappelow jchappelow commented Nov 15, 2024

There are certain errors from the database that indicate failures of the DB backend itself, such as out of memory/disk or corruption etc.

Unfortunately, these were not halting the node, only resulting in failed blockchain transactions, thus non-determinism since it would still compute an apphash (likely incorrect) and commit the block:

2024-11-14T15:25:16.331Z	warn	kwild.pg	pg/conn.go:137	ERROR [53200]: out of shared memory
2024-11-14T15:25:16.332Z	warn	kwild.abci	abci/abci.go:371	failed to execute transaction	{"error": "ERROR: out of shared memory (SQLSTATE 53200)\ninternal database error"}
2024-11-14T15:25:16.336Z	warn	kwild.abci	abci/abci.go:371	failed to execute transaction	{"error": "ERROR: out of shared memory (SQLSTATE 53200)\ninternal database error"}
2024-11-14T15:25:16.34Z	warn	kwild.abci	abci/abci.go:371	failed to execute transaction	{"error": "ERROR: out of shared memory (SQLSTATE 53200)\ninternal database error"}
2024-11-14T15:25:16.343Z	warn	kwild.abci	abci/abci.go:371	failed to execute transaction	{"error": "ERROR: out of shared memory (SQLSTATE 53200)\ninternal database error"}
2024-11-14T15:25:16.347Z	warn	kwild.abci	abci/abci.go:371	failed to execute transaction	{"error": "ERROR: out of shared memory (SQLSTATE 53200)\ninternal database error"}
2024-11-14T15:25:16.35Z	warn	kwild.abci	abci/abci.go:371	failed to execute transaction	{"error": "ERROR: out of shared memory (SQLSTATE 53200)\ninternal database error"}
2024-11-14T15:25:17.434Z	info	kwild.cometbft	state/execution.go:230	finalized block	{"module": "state", "height": 1095421, "num_txs_res": 501, "num_val_updates": 0, "block_app_hash": "0A4B186AD57C28DE4EBF694293800C25572267F0258FA694B52F42B7FD55B813"}

This PR fixes that by:

  • catching error codes in class 53, 58, or XX (https://www.postgresql.org/docs/current/errcodes-appendix.html)
  • closing the current db connection when the pg package detects this error
  • when detected, join the error with the common sql.ErrDBFailure error type
  • in abci's FinalizeBlock, detect this error type and cause the node to halt instead of just labeling the transaction as failed in the block result

Copy link
Contributor

@charithabandi charithabandi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jchappelow jchappelow merged commit 6804aa5 into kwilteam:main Nov 18, 2024
2 checks passed
@jchappelow jchappelow deleted the fatal-db-main branch November 18, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants