Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(katana): exit on block production error #2629

Merged
merged 1 commit into from
Nov 4, 2024

Conversation

kariy
Copy link
Member

@kariy kariy commented Nov 4, 2024

resolves #1650

currently, katana silently fails when an error happen during block production. this create some issues in the context of Slot hosting as unexpected behaviours can happen when the the instance is running out of storage. returning immediately upon failing makes it easier to detect this issue

2024-11-04T20:16:50.274691Z TRACE executor: Transaction resource usage. usage="steps: 3387 | memory holes: 32 | ec_op_builtin: 3 | pedersen_builtin: 16 | range_check_builtin: 65"
2024-11-04T20:16:50.336923Z ERROR node: Mining block. error=failed to write to db table CompiledClasses with key [7, 119, 132, 31, 135, 177, 34, 199, 133, 3, 124, 123, 169, 131, 85, 12, 179, 94, 71, 107, 237, 129, 83, 90, 108, 155, 91, 255, 242, 24, 57, 6]: No space left on device
2024-11-04T20:16:50.337613Z ERROR Stage{id=Sequencing}: pipeline: Block production task finished unexpectedly. reason=Ok(Completed(Err(Provider(Database(Write { error: Other(28), table: "CompiledClasses", key: [7, 119, 132, 31, 135, 177, 34, 199, 133, 3, 124, 123, 169, 131, 85, 12, 179, 94, 71, 107, 237, 129, 83, 90, 108, 155, 91, 255, 242, 24, 57, 6] })))))
2024-11-04T20:16:50.337666Z  INFO pipeline: Pipeline finished.
2024-11-04T20:16:50.337680Z DEBUG tasks: Task with graceful shutdown completed. task="Pipeline"
2024-11-04T20:16:50.337764Z  INFO katana::cli::node: Shutting down.

Summary by CodeRabbit

  • New Features
    • Introduced a new error type, BlockProductionError, improving error handling in block production tasks.
  • Bug Fixes
    • Enhanced error handling in the put method to prevent potential panics and provide better context for database write errors.
  • Documentation
    • Updated method signatures to reflect changes in error handling and return types for improved clarity.

Copy link

coderabbitai bot commented Nov 4, 2024

Walkthrough

Ohayo, sensei! This pull request introduces several modifications primarily focused on error handling within the block production process. The BlockProductionError enum is updated to remove the BlockMiningTaskCancelled variant, simplifying error management. The BlockProductionTask structure's output type is changed to return results with potential errors. Additionally, the run_block_production method in the Sequencing struct is modified to handle errors more effectively. These changes enhance the robustness of the block production logic while maintaining existing functionality.

Changes

File Path Change Summary
crates/katana/core/src/service/block_producer.rs Removed BlockMiningTaskCancelled variant from BlockProductionError. Updated error handling in block production logic.
crates/katana/core/src/service/mod.rs Changed BlockProductionTask output type from () to Result<(), BlockProductionError>. Updated poll method for error propagation.
crates/katana/pipeline/src/stage/sequencing.rs Modified run_block_production method to return TaskHandle<Result<(), BlockProductionError>>. Added import for BlockProductionError.
crates/katana/storage/db/src/mdbx/tx.rs Enhanced error handling in put method of DbTxMut. Replaced unwrap() with error mapping for better context.

Possibly related PRs

  • refactor(katana): separate node service task #2413: This PR modifies the BlockProductionTask structure and its methods, which are directly related to the changes made in the main PR regarding the BlockProductionError enum and the error handling in block production.
  • refactor(katana): stage sync pipeline #2502: This PR includes changes to the BlockProducer struct, which is relevant as the main PR also involves modifications to the BlockProducer and its error handling.
  • feat(katana): compute block commitments #2609: Although this PR focuses on computing block commitments, it involves changes to the Backend struct and methods that are part of the block production process, linking it to the overall context of block production and error handling.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
crates/katana/pipeline/src/stage/sequencing.rs (2)

Line range hint 55-62: Excellent error handling improvement, sensei!

The change to return Result<(), BlockProductionError> aligns perfectly with the goal of explicit error handling during block production.

Consider adding a debug log before spawning the task to help with debugging:

     fn run_block_production(&self) -> TaskHandle<Result<(), BlockProductionError>> {
         let pool = self.pool.clone();
         let miner = TransactionMiner::new(pool.add_listener());
         let block_producer = self.block_producer.clone();
 
         let service = BlockProductionTask::new(pool, miner, block_producer);
+        tracing::debug!(target: "pipeline", "Spawning block production task");
         self.task_spawner.build_task().name("Block production").spawn(service)
     }

Line range hint 82-89: Enhance error logging for block production failures, sensei!

The current error logging could be more specific for block production errors to better serve the PR's goal.

Consider this improvement:

             res = messaging => {
                 error!(target: "pipeline", reason = ?res, "Messaging task finished unexpectedly.");
             },
             res = block_production => {
-                error!(target: "pipeline", reason = ?res, "Block production task finished unexpectedly.");
+                match res {
+                    Ok(Ok(())) => error!(target: "pipeline", "Block production task finished unexpectedly with success."),
+                    Ok(Err(e)) => error!(target: "pipeline", error = ?e, "Block production failed with error."),
+                    Err(e) => error!(target: "pipeline", error = ?e, "Block production task panicked."),
+                }
             }

This change provides:

  1. Clear distinction between task failures and block production errors
  2. Better error context for debugging
  3. Proper handling of both Result layers (TaskHandle and BlockProductionError)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 82d3a85 and 57a865c.

📒 Files selected for processing (4)
  • crates/katana/core/src/service/block_producer.rs (0 hunks)
  • crates/katana/core/src/service/mod.rs (3 hunks)
  • crates/katana/pipeline/src/stage/sequencing.rs (2 hunks)
  • crates/katana/storage/db/src/mdbx/tx.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • crates/katana/core/src/service/block_producer.rs
🔇 Additional comments (5)
crates/katana/pipeline/src/stage/sequencing.rs (2)

6-6: Ohayo! Clean import addition, sensei!

The BlockProductionError import is properly placed with related block_producer imports.


Line range hint 55-89: Verify storage error handling, sensei!

The PR objectives mention issues with running out of storage. Let's verify that storage-related errors are properly propagated through BlockProductionError.

✅ Verification successful

Storage error handling is properly implemented, sensei!

The BlockProductionError correctly propagates storage-related errors through ProviderError, which handles various storage scenarios including:

  • Missing block data (hash/number)
  • Parsing errors
  • General provider errors

The implementation properly handles these through the error chain:

  1. Storage layer errors -> ProviderError
  2. ProviderError -> BlockProductionError (via #[from] derive)
  3. BlockProductionError -> Stage error handling
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for storage-related error definitions and handling
echo "Checking BlockProductionError definition for storage errors..."
rg -A 5 "pub enum BlockProductionError"

echo "Checking storage error handling in block production..."
ast-grep --pattern 'match $_ {
  $$$
  Err($_) => $$$
  $$$
}'

Length of output: 831


Script:

#!/bin/bash
# Let's check for storage-related error handling in the provider
echo "Checking ProviderError definition..."
rg -A 10 "pub enum ProviderError"

# Check block producer implementation for storage handling
echo "Checking block producer implementation..."
ast-grep --pattern 'impl BlockProducer {
  $$$
  fn $_($$$) -> Result<$_, $_> {
    $$$
  }
  $$$
}'

# Check if there are any storage-related modules
echo "Checking for storage modules..."
fd -t f "storage.*\.rs$"

Length of output: 3007

crates/katana/core/src/service/mod.rs (3)

8-8: Importing BlockProductionError for enhanced error handling

Ohayo, sensei! Including use block_producer::BlockProductionError; is a necessary addition. This import ensures that the BlockProductionError type is available for the improved error propagation in the block production task.


51-51: Changing Output type to Result<(), BlockProductionError>

Ohayo, sensei! Updating the Output type in the Future implementation to Result<(), BlockProductionError> is a wise move. This change allows the BlockProductionTask to return errors, enabling better error handling upstream.


72-72: Returning errors immediately upon block production failure

Ohayo, sensei! Modifying the poll method to return Poll::Ready(Err(error)) ensures that the task exits promptly when a block production error occurs. This adjustment prevents silent failures and enhances the robustness of the service.

crates/katana/storage/db/src/mdbx/tx.rs Show resolved Hide resolved
Copy link

codecov bot commented Nov 4, 2024

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 56.89%. Comparing base (131cd89) to head (57a865c).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
crates/katana/core/src/service/mod.rs 0.00% 1 Missing ⚠️
crates/katana/storage/db/src/mdbx/tx.rs 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2629      +/-   ##
==========================================
- Coverage   56.91%   56.89%   -0.03%     
==========================================
  Files         397      397              
  Lines       49466    49467       +1     
==========================================
- Hits        28154    28144      -10     
- Misses      21312    21323      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kariy kariy merged commit d09cbcf into main Nov 4, 2024
13 of 14 checks passed
@kariy kariy deleted the katana/stop-on-mining-error branch November 4, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tx failing silently when failing to commit db tx
2 participants