feat(katana): exit on block production error #2629

kariy · 2024-11-04T20:13:25Z

resolves #1650

currently, katana silently fails when an error happen during block production. this create some issues in the context of Slot hosting as unexpected behaviours can happen when the the instance is running out of storage. returning immediately upon failing makes it easier to detect this issue

2024-11-04T20:16:50.274691Z TRACE executor: Transaction resource usage. usage="steps: 3387 | memory holes: 32 | ec_op_builtin: 3 | pedersen_builtin: 16 | range_check_builtin: 65"
2024-11-04T20:16:50.336923Z ERROR node: Mining block. error=failed to write to db table CompiledClasses with key [7, 119, 132, 31, 135, 177, 34, 199, 133, 3, 124, 123, 169, 131, 85, 12, 179, 94, 71, 107, 237, 129, 83, 90, 108, 155, 91, 255, 242, 24, 57, 6]: No space left on device
2024-11-04T20:16:50.337613Z ERROR Stage{id=Sequencing}: pipeline: Block production task finished unexpectedly. reason=Ok(Completed(Err(Provider(Database(Write { error: Other(28), table: "CompiledClasses", key: [7, 119, 132, 31, 135, 177, 34, 199, 133, 3, 124, 123, 169, 131, 85, 12, 179, 94, 71, 107, 237, 129, 83, 90, 108, 155, 91, 255, 242, 24, 57, 6] })))))
2024-11-04T20:16:50.337666Z  INFO pipeline: Pipeline finished.
2024-11-04T20:16:50.337680Z DEBUG tasks: Task with graceful shutdown completed. task="Pipeline"
2024-11-04T20:16:50.337764Z  INFO katana::cli::node: Shutting down.

Summary by CodeRabbit

New Features
- Introduced a new error type, BlockProductionError, improving error handling in block production tasks.
Bug Fixes
- Enhanced error handling in the put method to prevent potential panics and provide better context for database write errors.
Documentation
- Updated method signatures to reflect changes in error handling and return types for improved clarity.

coderabbitai · 2024-11-04T20:19:16Z

Walkthrough

Ohayo, sensei! This pull request introduces several modifications primarily focused on error handling within the block production process. The BlockProductionError enum is updated to remove the BlockMiningTaskCancelled variant, simplifying error management. The BlockProductionTask structure's output type is changed to return results with potential errors. Additionally, the run_block_production method in the Sequencing struct is modified to handle errors more effectively. These changes enhance the robustness of the block production logic while maintaining existing functionality.

Changes

File Path	Change Summary
`crates/katana/core/src/service/block_producer.rs`	Removed `BlockMiningTaskCancelled` variant from `BlockProductionError`. Updated error handling in block production logic.
`crates/katana/core/src/service/mod.rs`	Changed `BlockProductionTask` output type from `()` to `Result<(), BlockProductionError>`. Updated `poll` method for error propagation.
`crates/katana/pipeline/src/stage/sequencing.rs`	Modified `run_block_production` method to return `TaskHandle<Result<(), BlockProductionError>>`. Added import for `BlockProductionError`.
`crates/katana/storage/db/src/mdbx/tx.rs`	Enhanced error handling in `put` method of `DbTxMut`. Replaced `unwrap()` with error mapping for better context.

Possibly related PRs

refactor(katana): separate node service task #2413: This PR modifies the BlockProductionTask structure and its methods, which are directly related to the changes made in the main PR regarding the BlockProductionError enum and the error handling in block production.
refactor(katana): stage sync pipeline #2502: This PR includes changes to the BlockProducer struct, which is relevant as the main PR also involves modifications to the BlockProducer and its error handling.
feat(katana): compute block commitments #2609: Although this PR focuses on computing block commitments, it involves changes to the Backend struct and methods that are part of the block production process, linking it to the overall context of block production and error handling.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

crates/katana/pipeline/src/stage/sequencing.rs (2)
Line range hint 55-62: Excellent error handling improvement, sensei!

The change to return Result<(), BlockProductionError> aligns perfectly with the goal of explicit error handling during block production.

Consider adding a debug log before spawning the task to help with debugging:
     fn run_block_production(&self) -> TaskHandle<Result<(), BlockProductionError>> {
         let pool = self.pool.clone();
         let miner = TransactionMiner::new(pool.add_listener());
         let block_producer = self.block_producer.clone();
 
         let service = BlockProductionTask::new(pool, miner, block_producer);
+        tracing::debug!(target: "pipeline", "Spawning block production task");
         self.task_spawner.build_task().name("Block production").spawn(service)
     }
Line range hint 82-89: Enhance error logging for block production failures, sensei!

The current error logging could be more specific for block production errors to better serve the PR's goal.

Consider this improvement:
             res = messaging => {
                 error!(target: "pipeline", reason = ?res, "Messaging task finished unexpectedly.");
             },
             res = block_production => {
-                error!(target: "pipeline", reason = ?res, "Block production task finished unexpectedly.");
+                match res {
+                    Ok(Ok(())) => error!(target: "pipeline", "Block production task finished unexpectedly with success."),
+                    Ok(Err(e)) => error!(target: "pipeline", error = ?e, "Block production failed with error."),
+                    Err(e) => error!(target: "pipeline", error = ?e, "Block production task panicked."),
+                }
             }
This change provides:

Clear distinction between task failures and block production errors

Better error context for debugging

Proper handling of both Result layers (TaskHandle and BlockProductionError)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 82d3a85 and 57a865c.

📒 Files selected for processing (4)

crates/katana/core/src/service/block_producer.rs (0 hunks)
crates/katana/core/src/service/mod.rs (3 hunks)
crates/katana/pipeline/src/stage/sequencing.rs (2 hunks)
crates/katana/storage/db/src/mdbx/tx.rs (1 hunks)

💤 Files with no reviewable changes (1)

crates/katana/core/src/service/block_producer.rs

🔇 Additional comments (5)

crates/katana/pipeline/src/stage/sequencing.rs (2)

6-6: Ohayo! Clean import addition, sensei!

The BlockProductionError import is properly placed with related block_producer imports.

Line range hint 55-89: Verify storage error handling, sensei!

The PR objectives mention issues with running out of storage. Let's verify that storage-related errors are properly propagated through BlockProductionError.

✅ Verification successful

Storage error handling is properly implemented, sensei!

The BlockProductionError correctly propagates storage-related errors through ProviderError, which handles various storage scenarios including:

Missing block data (hash/number)
Parsing errors
General provider errors

The implementation properly handles these through the error chain:

Storage layer errors -> ProviderError
ProviderError -> BlockProductionError (via #[from] derive)
BlockProductionError -> Stage error handling

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for storage-related error definitions and handling
echo "Checking BlockProductionError definition for storage errors..."
rg -A 5 "pub enum BlockProductionError"

echo "Checking storage error handling in block production..."
ast-grep --pattern 'match $_ {
  $$$
  Err($_) => $$$
  $$$
}'

Length of output: 831

Script:

#!/bin/bash
# Let's check for storage-related error handling in the provider
echo "Checking ProviderError definition..."
rg -A 10 "pub enum ProviderError"

# Check block producer implementation for storage handling
echo "Checking block producer implementation..."
ast-grep --pattern 'impl BlockProducer {
  $$$
  fn $_($$$) -> Result<$_, $_> {
    $$$
  }
  $$$
}'

# Check if there are any storage-related modules
echo "Checking for storage modules..."
fd -t f "storage.*\.rs$"

Length of output: 3007

crates/katana/core/src/service/mod.rs (3)

8-8: Importing BlockProductionError for enhanced error handling

Ohayo, sensei! Including use block_producer::BlockProductionError; is a necessary addition. This import ensures that the BlockProductionError type is available for the improved error propagation in the block production task.

51-51: Changing Output type to Result<(), BlockProductionError>

Ohayo, sensei! Updating the Output type in the Future implementation to Result<(), BlockProductionError> is a wise move. This change allows the BlockProductionTask to return errors, enabling better error handling upstream.

72-72: Returning errors immediately upon block production failure

Ohayo, sensei! Modifying the poll method to return Poll::Ready(Err(error)) ensures that the task exits promptly when a block production error occurs. This adjustment prevents silent failures and enhances the robustness of the service.

crates/katana/storage/db/src/mdbx/tx.rs

codecov · 2024-11-04T20:41:02Z

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 56.89%. Comparing base (131cd89) to head (57a865c).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/katana/core/src/service/mod.rs	0.00%	1 Missing ⚠️
crates/katana/storage/db/src/mdbx/tx.rs	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2629      +/-   ##
==========================================
- Coverage   56.91%   56.89%   -0.03%     
==========================================
  Files         397      397              
  Lines       49466    49467       +1     
==========================================
- Hits        28154    28144      -10     
- Misses      21312    21323      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

feat: exit on block production error

57a865c

coderabbitai bot reviewed Nov 4, 2024

View reviewed changes

crates/katana/storage/db/src/mdbx/tx.rs Show resolved Hide resolved

steebchen approved these changes Nov 4, 2024

View reviewed changes

kariy merged commit d09cbcf into main Nov 4, 2024
13 of 14 checks passed

kariy deleted the katana/stop-on-mining-error branch November 4, 2024 20:49

coderabbitai bot mentioned this pull request Nov 4, 2024

feat(katana): retain transactions in pool until mined #2630

Merged

coderabbitai bot mentioned this pull request Dec 3, 2024

refactor(katana): fix feeder gateway types #2760

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(katana): exit on block production error #2629

feat(katana): exit on block production error #2629

kariy commented Nov 4, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 4, 2024

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

codecov bot commented Nov 4, 2024 •

edited

Loading

feat(katana): exit on block production error #2629

feat(katana): exit on block production error #2629

Conversation

kariy commented Nov 4, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Nov 4, 2024

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 4, 2024 • edited Loading

Codecov Report

kariy commented Nov 4, 2024 •

edited by coderabbitai bot

Loading

codecov bot commented Nov 4, 2024 •

edited

Loading