Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(coordinator): assign static prover first and avoid reassigning failed task to same prover #1584

Merged
merged 33 commits into from
Jan 15, 2025

Conversation

yiweichi
Copy link
Member

@yiweichi yiweichi commented Dec 30, 2024

…ailing task to same prover

Purpose or design rationale of this PR

Describe your change. Make sure to answer these three questions: What does this PR do? Why does it do it? How does it do it?

  1. Tasks assignment static prover first and then dynamic prover (this is to save cost)
  2. Coordinator doesn’t dispatch the same failing job to the same prover

docs

PR title

Your PR title must follow conventional commits (as we are doing squash merge for each PR), so it must start with one of the following types:

  • build: Changes that affect the build system or external dependencies (example scopes: yarn, eslint, typescript)
  • ci: Changes to our CI configuration files and scripts (example scopes: vercel, github, cypress)
  • docs: Documentation-only changes
  • feat: A new feature
  • fix: A bug fix
  • perf: A code change that improves performance
  • refactor: A code change that doesn't fix a bug, or add a feature, or improves performance
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
  • test: Adding missing tests or correcting existing tests

Deployment tag versioning

Has tag in common/version.go been updated or have you added bump-version label to this PR?

  • No, this PR doesn't involve a new deployment, git tag, docker image tag
  • Yes

Breaking change label

Does this PR have the breaking-change label?

  • No, this PR is not a breaking change
  • Yes

Summary by CodeRabbit

  • Configuration

    • Added new configuration parameter external_prover_threshold to control external prover task assignment.
  • Task Management

    • Enhanced task assignment logic for external provers.
    • Implemented batch retrieval of assigned and unassigned tasks for batches, bundles, and chunks.
    • Added count retrieval for unassigned tasks to optimize prover task distribution.
  • Improvements

    • Improved error handling and logging for task assignment processes.
    • Added flexibility in task retrieval and assignment mechanisms.
    • Introduced new methods to retrieve counts of unassigned tasks and specific prover tasks based on parameters.
  • Version Update

    • Updated version tag to v4.4.86.
  • Authentication

    • Expanded authentication message structure to include ProverProviderType.
    • Added validation for ProverProviderType during login checks.
    • Enhanced mock prover login method to include ProverProviderType.

Copy link

coderabbitai bot commented Dec 30, 2024

Walkthrough

The pull request introduces a new configuration parameter external_prover_threshold to manage task assignment for external provers. This change modifies the configuration, ORM methods, and task assignment logic across multiple files in the coordinator package. The modifications enhance the flexibility of task retrieval and assignment, particularly for external provers, by allowing batch retrieval of tasks and implementing a threshold-based assignment mechanism.

Changes

File Change Summary
coordinator/conf/config.json Added external_prover_threshold configuration parameter set to 32
coordinator/internal/config/config.go Added ExternalProverThreshold field to ProverManager struct
coordinator/internal/logic/provertask/batch_prover_task.go Updated Assign method to enhance task assignment logic for external provers
coordinator/internal/logic/provertask/bundle_prover_task.go Updated Assign method and added new methods for improved task assignment logic
coordinator/internal/logic/provertask/chunk_prover_task.go Updated Assign method to include checks for external provers
coordinator/internal/logic/provertask/prover_task.go Added ProverProviderType field to proverTaskContext struct
coordinator/internal/orm/batch.go Added GetUnassignedBatchCount method for batch retrieval
coordinator/internal/orm/bundle.go Added GetUnassignedBundleCount method for bundle retrieval
coordinator/internal/orm/chunk.go Added GetUnassignedChunkCount method and updated retrieval methods
coordinator/internal/orm/prover_task.go Added GetFailedProverTasksByHash method
common/version/version.go Updated version tag from v4.4.85 to v4.4.86
coordinator/internal/controller/api/auth.go Enhanced JWT claims in PayloadFunc to include ProverProviderType
coordinator/internal/logic/auth/login.go Added validation for ProverProviderType in login logic
coordinator/internal/types/auth.go Introduced ProverProviderTypeKey constant and ProverProviderType field in Message struct
coordinator/internal/types/prover.go Added ProverProviderType type and constants for internal and external providers
coordinator/internal/utils/prover_name.go Introduced IsExternalProverNameMatch function for name validation
coordinator/internal/types/auth_test.go Added ProverProviderType field in Message struct for tests
coordinator/test/mock_prover.go Added ProverProviderType field in Message struct for mock login

Sequence Diagram

sequenceDiagram
    participant Prover
    participant Coordinator
    participant Database

    Prover->>Coordinator: Request Task
    Coordinator->>Database: Check Unassigned Tasks Count
    Database-->>Coordinator: Return Task Count
    alt Count Below Threshold
        Coordinator-->>Prover: No Task Assigned
    else Count Above Threshold
        Coordinator->>Database: Retrieve Assigned/Unassigned Tasks
        Database-->>Coordinator: Return Tasks
        Coordinator->>Prover: Assign Suitable Task
    end
Loading

Poem

🐰 Hopping through code with glee,
External provers, now set free!
Threshold magic, tasks galore,
Batch retrieval opens a new door
Coordination's rabbit dance! 🚀


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f05c6e and 1e63677.

📒 Files selected for processing (1)
  • coordinator/internal/utils/prover_name.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • coordinator/internal/utils/prover_name.go

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (13)
coordinator/internal/logic/provertask/chunk_prover_task.go (2)

65-75: External prover threshold check
This conditional correctly enforces a minimum number of unassigned tasks before assigning them to external provers, helping to optimize resource usage. Consider adding a debug log indicating when no tasks are assigned.

 if unassignedChunkCount < cp.cfg.ProverManager.ExternalProverThreshold {
+    log.Debug("Skipping external prover assignment",
+              "count", unassignedChunkCount,
+              "threshold", cp.cfg.ProverManager.ExternalProverThreshold)
     return nil, nil
 }

94-117: Avoid re-dispatching failing tasks & concurrency concern
The loop prevents reassigning the same failing task to the same prover. While this logic is correct, confirm that concurrent assignments are sufficiently protected. For high concurrency, consider row-level locks or other synchronization in the DB to avoid competing updates.

coordinator/internal/logic/provertask/bundle_prover_task.go (2)

67-77: External prover threshold check
Similar to the chunk logic, this section cleanly limits external prover usage by verifying if the unassigned bundle count is above the threshold. Consider logging debug info when skipping assignment.

 if unassignedBundleCount < bp.cfg.ProverManager.ExternalProverThreshold {
+    log.Debug("Skipping external prover assignment for bundles",
+              "count", unassignedBundleCount,
+              "threshold", bp.cfg.ProverManager.ExternalProverThreshold)
     return nil, nil
 }

96-119: Loop for assigned/unassigned tasks & fail-check
This loop ensures the same failing task won’t be re-dispatched to the same prover. Validate concurrent updates to avoid race conditions if multiple coordinator instances operate simultaneously.

coordinator/internal/logic/provertask/batch_prover_task.go (1)

96-119: Retry loop and preventing re-dispatch
The loop logic checks assigned tasks first, then unassigned tasks, while ensuring failed tasks are not re-dispatched to the same prover. Consider verifying concurrency safety.

coordinator/internal/orm/chunk.go (3)

76-86: GetUnassignedChunk implementation
Using raw SQL with fmt.Sprintf is generally safe here since the parameters are numeric, but parameter binding is often safer against injection. Consider GORM chain calls to reduce potential risk and improve maintainability.

- sql := fmt.Sprintf("SELECT * FROM chunk WHERE proving_status = %d AND total_attempts < %d ... LIMIT %d;", ...)
- err := db.Raw(sql).Scan(&chunks).Error
+ err := db.Model(&Chunk{}).
+    Where("proving_status = ?", int(types.ProvingTaskUnassigned)).
+    Where("total_attempts < ?", maxTotalAttempts).
+    Where("active_attempts < ?", maxActiveAttempts).
+    Where("end_block_number <= ?", height).
+    Where("deleted_at IS NULL").
+    Order("index").
+    Limit(int(limit)).
+    Find(&chunks).Error

88-101: GetUnassignedChunkCount mismatch in doc
The docstring references “based on the specified limit,” yet no limit parameter is used. Consider updating the docstring for accuracy.


104-115: GetAssignedChunks
Similar to GetUnassignedChunk, this also uses raw SQL. Consider switching to GORM chain calls for improved safety and maintainability.

coordinator/internal/orm/prover_task.go (1)

151-166: Clarify docstring vs. function name.

The doc comment says:

// GetTaskOfOtherProvers get the chunk/batch task of prover

But the function is named GetTaskOfProver. This mismatch may cause confusion. Please update the doc comment or rename the function to keep them in sync.

coordinator/internal/orm/bundle.go (1)

57-69: Fix verbiage in error message and doc comment.

  1. The error message references “Batch.GetUnassignedBundles” instead of “Bundle.GetUnassignedBundles”.
  2. The doc comment mentions “The returned batch sorts…” but the function works with bundles.

These references could create confusion. Rename them to be consistent with the “Bundle” function context.

coordinator/internal/orm/batch.go (3)

86-87: Consider using GORM query builder instead of raw SQL

While the current implementation using fmt.Sprintf is safe since all variables are numeric, using GORM's query builder would be more maintainable and safer by default. This also makes it easier to modify the query structure in the future.

-	sql := fmt.Sprintf("SELECT * FROM batch WHERE proving_status = %d AND total_attempts < %d AND active_attempts < %d AND chunk_proofs_status = %d AND batch.deleted_at IS NULL ORDER BY batch.index LIMIT %d;",
-		int(types.ProvingTaskUnassigned), maxTotalAttempts, maxActiveAttempts, int(types.ChunkProofsStatusReady), limit)
-	err := db.Raw(sql).Scan(&batch).Error
+	err := db.Where("proving_status = ?", types.ProvingTaskUnassigned).
+		Where("total_attempts < ?", maxTotalAttempts).
+		Where("active_attempts < ?", maxActiveAttempts).
+		Where("chunk_proofs_status = ?", types.ChunkProofsStatusReady).
+		Order("index ASC").
+		Limit(int(limit)).
+		Find(&batch).Error

81-93: Consider adding an index on the batch.index column

The query uses ORDER BY batch.index, which could be slow for large datasets without an index. Consider adding an index if one doesn't exist.

CREATE INDEX idx_batch_index ON batch(index);

110-121: Reduce code duplication with GetUnassignedBatches

The GetAssignedBatches method is nearly identical to GetUnassignedBatches except for the proving_status condition. Consider extracting the common logic into a shared method.

+func (o *Batch) getBatchesByStatus(ctx context.Context, status types.ProvingStatus, maxActiveAttempts, maxTotalAttempts uint8, limit uint64) ([]*Batch, error) {
+	var batch []*Batch
+	return db.Where("proving_status = ?", status).
+		Where("total_attempts < ?", maxTotalAttempts).
+		Where("active_attempts < ?", maxActiveAttempts).
+		Where("chunk_proofs_status = ?", types.ChunkProofsStatusReady).
+		Order("index ASC").
+		Limit(int(limit)).
+		Find(&batch).Error
+}
+
 func (o *Batch) GetUnassignedBatches(ctx context.Context, maxActiveAttempts, maxTotalAttempts uint8, limit uint64) ([]*Batch, error) {
-	var batch []*Batch
-	db := o.db.WithContext(ctx)
-	sql := fmt.Sprintf("SELECT * FROM batch WHERE proving_status = %d AND total_attempts < %d AND active_attempts < %d AND chunk_proofs_status = %d AND batch.deleted_at IS NULL ORDER BY batch.index LIMIT %d;",
-		int(types.ProvingTaskUnassigned), maxTotalAttempts, maxActiveAttempts, int(types.ChunkProofsStatusReady), limit)
-	err := db.Raw(sql).Scan(&batch).Error
-	if err != nil {
-		return nil, fmt.Errorf("Batch.GetUnassignedBatches error: %w", err)
-	}
-	return batch, nil
+	return o.getBatchesByStatus(ctx, types.ProvingTaskUnassigned, maxActiveAttempts, maxTotalAttempts, limit)
 }
 
 func (o *Batch) GetAssignedBatches(ctx context.Context, maxActiveAttempts, maxTotalAttempts uint8, limit uint64) ([]*Batch, error) {
-	var batch []*Batch
-	db := o.db.WithContext(ctx)
-	sql := fmt.Sprintf("SELECT * FROM batch WHERE proving_status = %d AND total_attempts < %d AND active_attempts < %d AND chunk_proofs_status = %d AND batch.deleted_at IS NULL ORDER BY batch.index LIMIT %d;",
-		int(types.ProvingTaskAssigned), maxTotalAttempts, maxActiveAttempts, int(types.ChunkProofsStatusReady), limit)
-	err := db.Raw(sql).Scan(&batch).Error
-	if err != nil {
-		return nil, fmt.Errorf("Batch.GetAssignedBatches error: %w", err)
-	}
-	return batch, nil
+	return o.getBatchesByStatus(ctx, types.ProvingTaskAssigned, maxActiveAttempts, maxTotalAttempts, limit)
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f92029a and 791fcaf.

📒 Files selected for processing (11)
  • coordinator/conf/config.json (1 hunks)
  • coordinator/internal/config/config.go (1 hunks)
  • coordinator/internal/logic/provertask/batch_prover_task.go (2 hunks)
  • coordinator/internal/logic/provertask/bundle_prover_task.go (2 hunks)
  • coordinator/internal/logic/provertask/chunk_prover_task.go (2 hunks)
  • coordinator/internal/logic/provertask/prover_task.go (1 hunks)
  • coordinator/internal/orm/batch.go (1 hunks)
  • coordinator/internal/orm/bundle.go (1 hunks)
  • coordinator/internal/orm/chunk.go (1 hunks)
  • coordinator/internal/orm/prover_task.go (1 hunks)
  • rollup/config.json (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • coordinator/internal/logic/provertask/prover_task.go
🔇 Additional comments (15)
coordinator/internal/logic/provertask/chunk_prover_task.go (3)

7-7: Use of strings package
Importing the strings package is necessary for prefix checks in subsequent lines. This is valid and consistent with the new external prover logic.


81-82: Offset-based iteration
Introducing separate offsets for assigned and unassigned tasks enhances clarity and supports batch assignment logic. This is a fine approach.


89-90: Retrieving unassigned chunks
Fetching up to 50 unassigned tasks is aligned with the new batch approach. The error logging is helpful for troubleshooting.

coordinator/internal/logic/provertask/bundle_prover_task.go (3)

7-7: Import strings
The strings import is added for prefix checks. This is appropriate given the external prover feature.


83-84: Assigned/unassigned offset
Using separate offsets for assigned and unassigned bundles promotes clarity in batch retrieval and assignment.


90-91: Fetching unassigned bundles
Fetching multiple unassigned tasks at once scales better than a single-task approach. Good error handling is present.

coordinator/internal/logic/provertask/batch_prover_task.go (4)

7-7: Import strings
Necessary for prefix checks related to external prover logic.


67-77: External prover threshold
Checking unassignedBatchCount ensures external provers only get assigned when tasks exceed a threshold. This aligns with the PR goal of reducing cost.


83-84: Offset definitions
Setting up assignedOffset and unassignedOffset makes it simpler to process tasks in batches.


90-91: Retrieving unassigned batches
Properly retrieves batches in bulk, improving assignment throughput. The error logging is appropriately handled.

coordinator/internal/config/config.go (1)

19-20: New ExternalProverThreshold field
Adding this threshold field is consistent with the logic implemented in the prover tasks. This helps reduce cost by controlling when external provers are utilized.

coordinator/internal/orm/bundle.go (2)

71-83: Validate boundary conditions for counting unassigned bundles.

This function appears correct. However, validate that upstream logic handles the scenario when maxActiveAttempts and maxTotalAttempts are lower than the usage patterns, to avoid retrieving trivial results or none at all.


86-97: Ensure consistent naming in error handling.

Similar to the previous function, the error message references “Bundle.GetAssignedBundles,” which is correct. Just be sure the code references remain consistent across the repository for easier debugging.

coordinator/conf/config.json (1)

5-5: Confirm correctness of threshold.

Adding external_prover_threshold = 32 is good if that’s the intended cutoff. Double-check that calling logic interprets this value as intended.

rollup/config.json (1)

1-40: Watch out for potential exposure of environment-specific values.

The L1 endpoint and relayer configuration appear to contain environment-specific details. Confirm these settings are correct for the environment (test, staging, production).

rollup/config.json Outdated Show resolved Hide resolved
coordinator/internal/orm/batch.go Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2024

Codecov Report

Attention: Patch coverage is 16.30435% with 77 lines in your changes missing coverage. Please review.

Project coverage is 51.72%. Comparing base (fa0927c) to head (1e63677).

Files with missing lines Patch % Lines
coordinator/internal/orm/prover_task.go 0.00% 16 Missing ⚠️
coordinator/internal/types/auth.go 39.13% 11 Missing and 3 partials ⚠️
coordinator/internal/orm/batch.go 0.00% 13 Missing ⚠️
coordinator/internal/orm/bundle.go 0.00% 13 Missing ⚠️
coordinator/internal/orm/chunk.go 0.00% 13 Missing ⚠️
coordinator/internal/types/prover.go 0.00% 8 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1584      +/-   ##
===========================================
- Coverage    52.01%   51.72%   -0.30%     
===========================================
  Files          157      157              
  Lines        12476    12555      +79     
===========================================
+ Hits          6490     6494       +4     
- Misses        5424     5497      +73     
- Partials       562      564       +2     
Flag Coverage Δ
coordinator 16.55% <16.30%> (-0.59%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
coordinator/internal/orm/batch.go (1)

81-93: Consider using GORM query builder instead of raw SQL.

The raw SQL query could be replaced with GORM's query builder for better maintainability and type safety. This would also help prevent potential SQL injection vulnerabilities.

-	sql := fmt.Sprintf("SELECT * FROM batch WHERE proving_status = %d AND total_attempts < %d AND active_attempts < %d AND chunk_proofs_status = %d AND batch.deleted_at IS NULL ORDER BY batch.index LIMIT %d;",
-		int(types.ProvingTaskUnassigned), maxTotalAttempts, maxActiveAttempts, int(types.ChunkProofsStatusReady), limit)
-	err := db.Raw(sql).Scan(&batch).Error
+	err := db.Model(&Batch{}).
+		Where("proving_status = ?", int(types.ProvingTaskUnassigned)).
+		Where("total_attempts < ?", maxTotalAttempts).
+		Where("active_attempts < ?", maxActiveAttempts).
+		Where("chunk_proofs_status = ?", int(types.ChunkProofsStatusReady)).
+		Order("index ASC").
+		Limit(int(limit)).
+		Find(&batch).Error
coordinator/internal/logic/provertask/bundle_prover_task.go (2)

70-70: Fix incorrect error message text

The error message references "chunk" instead of "bundle" in the log statement.

-			log.Error("failed to get unassigned chunk proving tasks count", "height", getTaskParameter.ProverHeight, "err", err)
+			log.Error("failed to get unassigned bundle proving tasks count", "height", getTaskParameter.ProverHeight, "err", err)

84-84: Consider making the batch size configurable

The magic number '50' used for batch retrieval size could be moved to configuration for better maintainability and tuning.

Consider adding a configuration parameter for the batch size:

 type Config struct {
     ProverManager struct {
         ProversPerSession int
         SessionAttempts int
         ExternalProverThreshold int
+        TaskBatchSize int `json:"task_batch_size" default:"50"`
     }
 }

Then use it in the code:

-		tmpAssignedBundleTasks, getTaskError := bp.bundleOrm.GetAssignedBundles(ctx.Copy(), maxActiveAttempts, maxTotalAttempts, 50)
+		tmpAssignedBundleTasks, getTaskError := bp.bundleOrm.GetAssignedBundles(ctx.Copy(), maxActiveAttempts, maxTotalAttempts, bp.cfg.ProverManager.TaskBatchSize)

Also applies to: 91-91

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 791fcaf and a75075d.

📒 Files selected for processing (4)
  • coordinator/internal/logic/provertask/batch_prover_task.go (2 hunks)
  • coordinator/internal/logic/provertask/bundle_prover_task.go (2 hunks)
  • coordinator/internal/logic/provertask/chunk_prover_task.go (2 hunks)
  • coordinator/internal/orm/batch.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • coordinator/internal/logic/provertask/chunk_prover_task.go
  • coordinator/internal/logic/provertask/batch_prover_task.go
🔇 Additional comments (4)
coordinator/internal/orm/batch.go (1)

95-108: LGTM! Well-structured implementation.

The method follows best practices by using GORM's query builder and maintains consistency with GetUnassignedBatches conditions.

coordinator/internal/logic/provertask/bundle_prover_task.go (3)

7-7: LGTM: Import addition is appropriate

The addition of the strings package import is necessary for the new external prover name prefix checking functionality.


67-77: LGTM: External prover threshold check implementation

The implementation correctly enforces the threshold check for external provers, aligning with the PR's objective to prioritize static provers.


83-119: LGTM: Improved task assignment logic with failure prevention

The implementation efficiently handles task assignment using batch retrieval and properly prevents reassigning failed tasks to the same prover. The offset-based iteration is a clean approach to process both assigned and unassigned tasks.

Let's verify the prover task status check implementation:

✅ Verification successful

Task reassignment prevention logic is consistently implemented

The verification confirms that the task reassignment prevention logic is correctly implemented across the codebase:

  • The same prevention check types.ProverProveStatus(proverTask.ProvingStatus) != types.ProverProofInvalid is consistently used in all prover task handlers (chunk, batch, and bundle)
  • The status updates to ProverProofInvalid are properly handled in the proof receiver and task collection components
  • The implementation includes proper error handling and logging for invalid proof submissions

Let's verify the task status transitions to ensure the complete flow is correct:


Task status management is well-defined and properly enforced

The verification confirms that the task assignment logic is robust and correctly implemented:

  1. The status transitions are well-defined with clear enum values:

    • ProverProveStatusUndefined (0)
    • ProverAssigned (1)
    • ProverProofValid (2)
    • ProverProofInvalid (3)
  2. The prevention of reassigning failed tasks to the same prover is consistently implemented across all task types (chunk, batch, bundle) using the same status check.

  3. The status flow ensures that:

    • Tasks start in an unassigned state
    • Get marked as assigned when dispatched
    • Can transition to either valid or invalid based on proof verification
    • Invalid proofs from a prover prevent that same prover from getting reassigned the task
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that failed tasks are not being reassigned to the same prover

# Search for all prover task status checks in the codebase
echo "Searching for prover task status checks..."
rg -A 3 "types\.ProverProofInvalid" 

# Search for potential places where tasks might be reassigned
echo "Searching for task assignment patterns..."
ast-grep --pattern 'if $_ != types.ProverProofInvalid { $$$$ }'

Length of output: 4257


Script:

#!/bin/bash
# Check the task status enum definition and transitions
echo "Checking task status definitions..."
rg -A 10 "type ProverProveStatus"

# Check status update implementations
echo "Checking status update implementations..."
ast-grep --pattern 'UpdateProverTaskProvingStatus($$$)'

Length of output: 900

coordinator/internal/orm/batch.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
coordinator/internal/logic/provertask/bundle_prover_task.go (1)

96-119: Consider adding early return optimization

The implementation is solid but could be optimized by adding an early return when both offsets exceed their respective list lengths.

 for {
+    if assignedOffset >= len(tmpAssignedBundleTasks) && unassignedOffset >= len(tmpUnassignedBundleTask) {
+        log.Debug("no more tasks available", "height", getTaskParameter.ProverHeight)
+        return nil, nil
+    }
     tmpBundleTask = nil
     if assignedOffset < len(tmpAssignedBundleTasks) {
coordinator/internal/logic/provertask/batch_prover_task.go (3)

74-76: Add debug logging for external prover threshold check

When skipping task assignment due to threshold condition, adding debug logging would help with monitoring and debugging.

 		if unassignedBatchCount < bp.cfg.ProverManager.ExternalProverThreshold {
+			log.Debug("skipping external prover assignment: below threshold", 
+				"unassigned_count", unassignedBatchCount,
+				"threshold", bp.cfg.ProverManager.ExternalProverThreshold,
+				"prover", taskCtx.ProverName)
 			return nil, nil
 		}

83-84: Consider making pagination and retry parameters configurable

The code uses hardcoded values for pagination (50) and retry attempts (5). These should be configurable parameters.

Also, the comment explaining the query split could be more descriptive.

  1. Add configuration parameters:
type Config struct {
    ProverManager struct {
        // ... existing fields ...
+       BatchTaskPaginationSize int `json:"batch_task_pagination_size"`
+       BatchTaskMaxRetries     int `json:"batch_task_max_retries"`
    }
}
  1. Improve the comment:
-       // Why here need get again? In order to support a task can assign to multiple prover, need also assign `ProvingTaskAssigned`
-       // chunk to prover. But use `proving_status in (1, 2)` will not use the postgres index. So need split the sql.
+       // Split the queries for better performance:
+       // 1. We need to support assigning a task to multiple provers, including already assigned tasks
+       // 2. Using a single query with `proving_status in (1, 2)` would not utilize the PostgreSQL index
+       // 3. Therefore, we split into separate queries for assigned and unassigned tasks to leverage indexes

Also applies to: 90-91


111-119: Consider adding metrics for failed task reassignment prevention

The code prevents reassigning failed tasks to the same prover, but there's no monitoring for how often this occurs. Adding metrics would help track the effectiveness of this prevention mechanism.

 type BatchProverTask struct {
     // ... existing fields ...
+    batchTaskReassignmentPrevented *prometheus.CounterVec
 }

 func NewBatchProverTask(cfg *config.Config, chainCfg *params.ChainConfig, db *gorm.DB, reg prometheus.Registerer) *BatchProverTask {
     bp := &BatchProverTask{
         // ... existing initialization ...
+        batchTaskReassignmentPrevented: promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
+            Name: "coordinator_batch_task_reassignment_prevented_total",
+            Help: "Number of times a failed task was prevented from being reassigned to the same prover.",
+        }, []string{"prover_name"}),
     }
     return bp
 }

 // In the Assign method:
 if proverTask != nil && types.ProverProveStatus(proverTask.ProvingStatus) == types.ProverProofInvalid {
+    bp.batchTaskReassignmentPrevented.WithLabelValues(taskCtx.ProverName).Inc()
     continue
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c5d88d and e4c0779.

📒 Files selected for processing (2)
  • coordinator/internal/logic/provertask/batch_prover_task.go (2 hunks)
  • coordinator/internal/logic/provertask/bundle_prover_task.go (2 hunks)
🔇 Additional comments (4)
coordinator/internal/logic/provertask/bundle_prover_task.go (4)

7-7: LGTM: Import addition is appropriate

The strings package import is required for the new prefix checking functionality.


67-77: LGTM: External prover threshold check is well implemented

The implementation correctly:

  1. Identifies external provers using the prefix check
  2. Verifies unassigned bundle count against threshold
  3. Prevents unnecessary external prover assignments when load is low

83-95: LGTM: Efficient batch retrieval implementation

The implementation:

  1. Uses pagination with offset tracking
  2. Properly handles database errors
  3. Includes clear documentation explaining the index usage optimization

Line range hint 151-160: Clarify status of commented hard fork compatibility check

There's a substantial block of commented code related to hard fork compatibility checking. Please either:

  1. Remove if no longer needed
  2. Uncomment if the check is still required
  3. Add a comment explaining why this is temporarily disabled

This will help maintain code clarity and prevent technical debt.

@yiweichi yiweichi added the bump-version Bump the version tag for deployment label Jan 6, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
coordinator/internal/logic/provertask/bundle_prover_task.go (1)

70-70: ⚠️ Potential issue

Fix error variable in logging statement

The error logging statement uses err instead of getCountError.

-log.Error("failed to get unassigned batch proving tasks count", "height", getTaskParameter.ProverHeight, "err", err)
+log.Error("failed to get unassigned batch proving tasks count", "height", getTaskParameter.ProverHeight, "err", getCountError)
🧹 Nitpick comments (4)
coordinator/internal/orm/chunk.go (1)

91-105: Consider adding composite index for performance optimization.

To optimize the frequent queries on chunk status and attempts, consider adding a composite index on (proving_status, total_attempts, active_attempts, end_block_number).

coordinator/internal/logic/provertask/bundle_prover_task.go (3)

73-76: Add logging for external prover threshold check

Consider adding debug logging when skipping external provers due to threshold condition. This will help with monitoring and debugging the assignment process.

 if unassignedBundleCount < bp.cfg.ProverManager.ExternalProverThreshold {
+    log.Debug("skipping external prover assignment: below threshold",
+        "count", unassignedBundleCount,
+        "threshold", bp.cfg.ProverManager.ExternalProverThreshold,
+        "prover", taskCtx.ProverName)
     return nil, nil
 }

104-113: Enhance error and debug logging

The error and debug messages could be more descriptive to aid in troubleshooting.

-      log.Error("failed to get prover task of prover", "proof_type", message.ProofTypeBatch.String(), "taskID", tmpBundleTask.Hash, "key", taskCtx.PublicKey, "Prover_version", taskCtx.ProverVersion, "error", getTaskError)
+      log.Error("failed to check prover's previous attempts", "proof_type", message.ProofTypeBatch.String(), "taskID", tmpBundleTask.Hash, "key", taskCtx.PublicKey, "prover_version", taskCtx.ProverVersion, "error", getTaskError)

-      log.Debug("get empty bundle, the prover already failed this task", "height", getTaskParameter.ProverHeight)
+      log.Debug("skipping task assignment: prover previously failed this task",
+          "height", getTaskParameter.ProverHeight,
+          "task_id", tmpBundleTask.Hash,
+          "prover", taskCtx.ProverName)

Line range hint 81-82: Consider extracting retry configuration

The retry mechanism uses magic numbers (5 retries, 100ms sleep). Consider extracting these values to configuration parameters for better maintainability.

+const (
+    maxRetryAttempts = 5
+    retryDelay       = 100 * time.Millisecond
+)

-for i := 0; i < 5; i++ {
+for i := 0; i < maxRetryAttempts; i++ {
     // ... existing code ...
-    time.Sleep(100 * time.Millisecond)
+    time.Sleep(retryDelay)

Also applies to: 123-124

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d1c303 and c2075c8.

📒 Files selected for processing (7)
  • common/version/version.go (1 hunks)
  • coordinator/internal/logic/provertask/batch_prover_task.go (3 hunks)
  • coordinator/internal/logic/provertask/bundle_prover_task.go (3 hunks)
  • coordinator/internal/logic/provertask/chunk_prover_task.go (3 hunks)
  • coordinator/internal/orm/batch.go (1 hunks)
  • coordinator/internal/orm/bundle.go (1 hunks)
  • coordinator/internal/orm/chunk.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • common/version/version.go
  • coordinator/internal/orm/bundle.go
  • coordinator/internal/orm/batch.go
  • coordinator/internal/logic/provertask/batch_prover_task.go
🔇 Additional comments (6)
coordinator/internal/logic/provertask/chunk_prover_task.go (3)

65-75: LGTM! External prover threshold check is well implemented.

The implementation correctly prioritizes static provers by checking the unassigned task count against a threshold before assigning tasks to external provers. Error handling and logging are properly implemented.


102-111: LGTM! Task reassignment prevention is well implemented.

The implementation successfully prevents reassigning failed tasks to the same prover by checking the prover's previous task status. Error handling and logging are properly implemented.


Line range hint 143-151: Verify if version compatibility check should be removed.

The commented out code block appears to implement important version compatibility checks. Please verify if this check should be removed, as it could lead to version incompatibility issues between provers and tasks.

coordinator/internal/orm/chunk.go (1)

91-105: LGTM! GetUnassignedChunkCount implementation is clean and efficient.

The method follows GORM best practices with proper error handling and maintains consistency with existing query conditions.

coordinator/internal/logic/provertask/bundle_prover_task.go (2)

7-7: LGTM!

The addition of the strings package import is appropriate for the new prefix checking functionality.


Line range hint 146-154: Verify if hard fork compatibility check should be uncommented

There's a commented-out block that checks hard fork compatibility. Please verify if this check should be enabled or removed.

Run this script to check if hard fork compatibility checks are implemented elsewhere:

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
coordinator/internal/orm/batch.go (1)

98-112: Implementation looks good!

The method correctly implements all necessary conditions for unassigned batch counting, including the chunk_proofs_status check.

Consider extracting the common query conditions into a helper method to reduce code duplication with GetUnassignedBatch. Example:

+func (o *Batch) applyUnassignedBatchConditions(db *gorm.DB, maxActiveAttempts, maxTotalAttempts uint8) *gorm.DB {
+	return db.Where("proving_status = ?", int(types.ProvingTaskUnassigned)).
+		Where("total_attempts < ?", maxTotalAttempts).
+		Where("active_attempts < ?", maxActiveAttempts).
+		Where("chunk_proofs_status = ?", int(types.ChunkProofsStatusReady)).
+		Where("batch.deleted_at IS NULL")
+}

 func (o *Batch) GetUnassignedBatchCount(ctx context.Context, maxActiveAttempts, maxTotalAttempts uint8) (int64, error) {
 	var count int64
 	db := o.db.WithContext(ctx)
 	db = db.Model(&Batch{})
-	db = db.Where("proving_status = ?", int(types.ProvingTaskUnassigned))
-	db = db.Where("total_attempts < ?", maxTotalAttempts)
-	db = db.Where("active_attempts < ?", maxActiveAttempts)
-	db = db.Where("chunk_proofs_status = ?", int(types.ChunkProofsStatusReady))
-	db = db.Where("batch.deleted_at IS NULL")
+	db = o.applyUnassignedBatchConditions(db, maxActiveAttempts, maxTotalAttempts)
 	if err := db.Count(&count).Error; err != nil {
 		return 0, fmt.Errorf("Batch.GetUnassignedBatchCount error: %w", err)
 	}
 	return count, nil
 }
coordinator/internal/orm/chunk.go (3)

91-92: Add return value documentation.

Consider adding documentation to explain what the count represents and how it's used in the context of task assignment.

 // GetUnassignedChunkCount retrieves unassigned chunk count.
+// Returns the number of chunks that are available for assignment to provers,
+// considering the maximum attempt limits and block height constraints.

96-100: Consider extracting common query conditions.

The same conditions are used in both GetUnassignedChunk and GetUnassignedChunkCount. Consider extracting these into a method to maintain DRY principles and ensure consistency.

+// applyUnassignedChunkConditions applies common conditions for unassigned chunk queries
+func (o *Chunk) applyUnassignedChunkConditions(db *gorm.DB, maxActiveAttempts, maxTotalAttempts uint8, height uint64) *gorm.DB {
+    return db.Where("proving_status = ?", int(types.ProvingTaskUnassigned)).
+        Where("total_attempts < ?", maxTotalAttempts).
+        Where("active_attempts < ?", maxActiveAttempts).
+        Where("end_block_number <= ?", height).
+        Where("chunk.deleted_at IS NULL")
+}

Then use it in both methods:

 func (o *Chunk) GetUnassignedChunkCount(ctx context.Context, maxActiveAttempts, maxTotalAttempts uint8, height uint64) (int64, error) {
     var count int64
     db := o.db.WithContext(ctx)
-    db = db.Model(&Chunk{})
-    db = db.Where("proving_status = ?", int(types.ProvingTaskUnassigned))
-    db = db.Where("total_attempts < ?", maxTotalAttempts)
-    db = db.Where("active_attempts < ?", maxActiveAttempts)
-    db = db.Where("end_block_number <= ?", height)
-    db = db.Where("chunk.deleted_at IS NULL")
+    db = o.applyUnassignedChunkConditions(db.Model(&Chunk{}), maxActiveAttempts, maxTotalAttempts, height)
     if err := db.Count(&count).Error; err != nil {
         return 0, fmt.Errorf("Chunk.GetUnassignedChunkCount error: %w", err)
     }
     return count, nil
 }

91-105: Consider adding database index for performance.

Since this count query will likely be frequently called during task assignment decisions, consider adding a composite index on the queried columns to improve performance.

CREATE INDEX idx_chunk_assignment ON chunk (proving_status, total_attempts, active_attempts, end_block_number)
WHERE deleted_at IS NULL;
coordinator/internal/logic/provertask/bundle_prover_task.go (2)

67-77: Extract magic number into a constant

The retry count of 5 appears to be a magic number. Consider extracting it into a named constant for better maintainability.

Apply this diff to improve the code:

+const maxAssignmentRetries = 5
+
 func (bp *BundleProverTask) Assign(ctx *gin.Context, getTaskParameter *coordinatorType.GetTaskParameter) (*coordinatorType.GetTaskSchema, error) {
     // ...
     if taskCtx.ProverProviderType == uint8(coordinatorType.ProverProviderTypeExternal) {
         unassignedBundleCount, getCountError := bp.bundleOrm.GetUnassignedBundleCount(ctx.Copy(), maxActiveAttempts, maxTotalAttempts)
         if getCountError != nil {
             log.Error("failed to get unassigned bundle proving tasks count", "height", getTaskParameter.ProverHeight, "err", getCountError)
             return nil, ErrCoordinatorInternalFailure
         }
         if unassignedBundleCount < bp.cfg.ProverManager.ExternalProverThreshold {
             return nil, nil
         }
     }

104-116: Extract magic number into a constant

The value 2 in GetFailedProverTasksByHash appears to be a magic number. Consider extracting it into a named constant for better maintainability and documentation.

Apply this diff to improve the code:

+// maxFailedTasksToCheck is the number of recent failed tasks to check
+const maxFailedTasksToCheck = 2
+
-        proverTasks, getTaskError := bp.proverTaskOrm.GetFailedProverTasksByHash(ctx.Copy(), message.ProofTypeBundle, tmpBundleTask.Hash, 2)
+        proverTasks, getTaskError := bp.proverTaskOrm.GetFailedProverTasksByHash(ctx.Copy(), message.ProofTypeBundle, tmpBundleTask.Hash, maxFailedTasksToCheck)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27ae174 and cc57700.

📒 Files selected for processing (6)
  • coordinator/internal/logic/provertask/bundle_prover_task.go (3 hunks)
  • coordinator/internal/logic/provertask/prover_task.go (2 hunks)
  • coordinator/internal/orm/batch.go (1 hunks)
  • coordinator/internal/orm/bundle.go (1 hunks)
  • coordinator/internal/orm/chunk.go (1 hunks)
  • coordinator/internal/orm/prover_task.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • coordinator/internal/logic/provertask/prover_task.go
  • coordinator/internal/orm/prover_task.go
🔇 Additional comments (2)
coordinator/internal/orm/chunk.go (1)

91-105: Implementation looks good!

The method correctly implements the counting logic with proper error handling and consistent query conditions.

coordinator/internal/logic/provertask/bundle_prover_task.go (1)

Line range hint 150-158: Verify if hard fork version check should be enabled

The commented-out code appears to be a critical version compatibility check. This could lead to potential issues if incompatible prover versions are allowed to process tasks.

Please verify if this check should be re-enabled. If it's intentionally disabled, consider:

  1. Adding a comment explaining why it's disabled
  2. Creating a tracking issue for re-enabling it
  3. Adding monitoring/alerts for version mismatches

Run this script to check for version-related issues in the codebase:

coordinator/internal/orm/bundle.go Outdated Show resolved Hide resolved
@yiweichi yiweichi requested a review from georgehao January 13, 2025 09:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
coordinator/internal/logic/provertask/bundle_prover_task.go (3)

67-77: Enhance error logging for threshold check

While the implementation correctly handles the external prover threshold check, consider adding more context to the logs when skipping external provers due to threshold conditions.

Add threshold information to help with debugging:

 if unassignedBundleCount < bp.cfg.ProverManager.ExternalProverThreshold {
+    log.Debug("skipping external prover assignment", 
+        "unassigned_count", unassignedBundleCount,
+        "threshold", bp.cfg.ProverManager.ExternalProverThreshold)
     return nil, nil
 }

105-105: Replace magic number with a constant

The magic number 2 in GetFailedProverTasksByHash should be replaced with a named constant for better maintainability and clarity.

Consider defining and using a constant:

+const maxFailedTasksToCheck = 2
-proverTasks, getTaskError := bp.proverTaskOrm.GetFailedProverTasksByHash(ctx.Copy(), message.ProofTypeBundle, tmpBundleTask.Hash, 2)
+proverTasks, getTaskError := bp.proverTaskOrm.GetFailedProverTasksByHash(ctx.Copy(), message.ProofTypeBundle, tmpBundleTask.Hash, maxFailedTasksToCheck)

113-114: Enhance debug logging

The debug log could provide more context about the failed task.

Add more details to help with debugging:

-log.Debug("get empty bundle, the prover already failed this task", "height", getTaskParameter.ProverHeight)
+log.Debug("skipping task assignment - prover previously failed this task",
+    "height", getTaskParameter.ProverHeight,
+    "task_hash", tmpBundleTask.Hash,
+    "prover_key", taskCtx.PublicKey,
+    "prover_name", taskCtx.ProverName)
coordinator/internal/logic/provertask/chunk_prover_task.go (4)

65-75: LGTM: Well-structured external prover threshold check

The implementation correctly handles the external prover assignment with proper error handling and early returns.

Consider enhancing the error message to include the threshold value for better debugging:

-			log.Error("failed to get unassigned chunk proving tasks count", "height", getTaskParameter.ProverHeight, "err", getCountError)
+			log.Error("failed to get unassigned chunk proving tasks count", "height", getTaskParameter.ProverHeight, "threshold", cp.cfg.ProverManager.ExternalProverThreshold, "err", getCountError)

Line range hint 77-81: Extract magic number into a constant

The retry count of 5 should be defined as a named constant at the package level for better maintainability.

 package provertask

+const maxAssignmentRetries = 5
+
 import (
     // ... existing imports ...
 )

 // In the Assign method:
-for i := 0; i < 5; i++ {
+for i := 0; i < maxAssignmentRetries; i++ {

102-107: Extract magic number into a constant

The hard-coded value '2' in GetFailedProverTasksByHash should be defined as a named constant with a clear purpose.

 package provertask

+// maxFailedTasksToCheck defines the number of recent failed tasks to check
+// when determining if a prover has previously failed a task
+const maxFailedTasksToCheck = 2
+
 // ... rest of the code ...

-		proverTasks, getTaskError := cp.proverTaskOrm.GetFailedProverTasksByHash(ctx.Copy(), message.ProofTypeChunk, tmpChunkTask.Hash, 2)
+		proverTasks, getTaskError := cp.proverTaskOrm.GetFailedProverTasksByHash(ctx.Copy(), message.ProofTypeChunk, tmpChunkTask.Hash, maxFailedTasksToCheck)

108-114: Improve readability of prover matching logic

The complex condition checking for prover matches could be extracted into a helper method for better readability and maintainability.

+func (cp *ChunkProverTask) isProverMatch(failedTask *orm.ProverTask, taskCtx *coordinatorType.TaskContext) bool {
+	if failedTask.ProverPublicKey == taskCtx.PublicKey {
+		return true
+	}
+	
+	return taskCtx.ProverProviderType == uint8(coordinatorType.ProverProviderTypeExternal) && 
+		cutils.IsExternalProverNameMatch(failedTask.ProverName, taskCtx.ProverName)
+}
+
 // In the Assign method:
-			if proverTasks[i].ProverPublicKey == taskCtx.PublicKey ||
-				taskCtx.ProverProviderType == uint8(coordinatorType.ProverProviderTypeExternal) && cutils.IsExternalProverNameMatch(proverTasks[i].ProverName, taskCtx.ProverName) {
+			if cp.isProverMatch(&proverTasks[i], taskCtx) {
coordinator/internal/types/auth.go (1)

35-40: Consider versioning the authentication protocol.

Given the breaking changes in the message structure and authentication flow, consider:

  1. Implementing versioned message types (e.g., MessageV1, MessageV2)
  2. Adding a protocol version field to support backward compatibility
  3. Creating a migration guide for clients
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc57700 and f725c2e.

📒 Files selected for processing (7)
  • coordinator/internal/logic/auth/login.go (1 hunks)
  • coordinator/internal/logic/provertask/batch_prover_task.go (3 hunks)
  • coordinator/internal/logic/provertask/bundle_prover_task.go (3 hunks)
  • coordinator/internal/logic/provertask/chunk_prover_task.go (3 hunks)
  • coordinator/internal/orm/bundle.go (1 hunks)
  • coordinator/internal/orm/prover_task.go (1 hunks)
  • coordinator/internal/types/auth.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • coordinator/internal/logic/auth/login.go
  • coordinator/internal/orm/bundle.go
  • coordinator/internal/orm/prover_task.go
  • coordinator/internal/logic/provertask/batch_prover_task.go
🔇 Additional comments (5)
coordinator/internal/logic/provertask/bundle_prover_task.go (1)

24-24: LGTM!

The new import for coordinator utils is properly added and follows Go import conventions.

coordinator/internal/logic/provertask/chunk_prover_task.go (1)

24-24: LGTM: Clean import addition

The new import with alias 'cutils' is appropriately added for the external prover name matching functionality.

coordinator/internal/types/auth.go (3)

21-22: LGTM! Constant follows established patterns.

The new constant ProverProviderTypeKey follows the existing naming conventions and is well-documented.


35-40: Verify hash calculation compatibility.

The changes to the Message struct fields will affect the RLP encoding in the Hash() method, potentially invalidating existing signatures. Please ensure:

  1. Existing clients are notified of this breaking change
  2. Version compatibility is maintained if needed
  3. Test cases cover both old and new message formats

Let's verify the hash calculation changes:


35-40: ⚠️ Potential issue

Review validation and JSON tag consistency.

Several concerns with the struct field modifications:

  1. Making ProverTypes optional appears to be a breaking change that could affect existing clients.
  2. ProverProviderType lacks the binding:"required" tag despite being a key field for authentication.
  3. Inconsistent use of omitempty - only applied to ProverProviderType but not to other optional fields like VKs.

Consider applying these changes:

 type Message struct {
   Challenge          string             `form:"challenge" json:"challenge" binding:"required"`
   ProverVersion      string             `form:"prover_version" json:"prover_version" binding:"required"`
   ProverName         string             `form:"prover_name" json:"prover_name" binding:"required"`
-  ProverProviderType ProverProviderType `form:"prover_provider_type" json:"prover_provider_type,omitempty"`
-  ProverTypes        []ProverType       `form:"prover_types" json:"prover_types"`
-  VKs                []string           `form:"vks" json:"vks"`
+  ProverProviderType ProverProviderType `form:"prover_provider_type" json:"prover_provider_type" binding:"required"`
+  ProverTypes        []ProverType       `form:"prover_types" json:"prover_types,omitempty" binding:"dive"`
+  VKs                []string           `form:"vks" json:"vks,omitempty"`
 }

Let's verify the impact of these changes on existing clients:

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
coordinator/internal/types/auth.go (2)

33-39: Add documentation for the new struct.

Please add a comment explaining that this struct is used for backward compatibility when calculating message hashes for older clients that don't support the provider type field.

+// MessageWithoutProverProviderType represents the message structure used for backward compatibility
+// when verifying signatures from older clients that don't include the ProverProviderType field.
 type MessageWithoutProverProviderType struct {

132-141: Improve type safety of the Hash function.

The function accepts interface{} which could lead to runtime errors if an incompatible type is passed. Consider:

  1. Adding type assertions or validation
  2. Using a more specific interface that defines required methods
  3. Adding error documentation for unsupported types
-// Hash returns the hash of the auth message, which should be the message used
-// to construct the Signature.
+// Hash returns the hash of the auth message using RLP encoding and Keccak256.
+// The input must be either Message or MessageWithoutProverProviderType.
+// Returns an error if the input type is not supported or if RLP encoding fails.
 func Hash(i interface{}) ([]byte, error) {
+    switch i.(type) {
+    case Message, MessageWithoutProverProviderType:
+        // proceed with hashing
+    default:
+        return nil, fmt.Errorf("unsupported type for hashing: %T", i)
+    }
     byt, err := rlp.EncodeToBytes(i)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aae19ab and 35f22b4.

⛔ Files ignored due to path filters (1)
  • go.work.sum is excluded by !**/*.sum
📒 Files selected for processing (1)
  • coordinator/internal/types/auth.go (6 hunks)
🔇 Additional comments (4)
coordinator/internal/types/auth.go (4)

21-22: LGTM!

The constant definition follows the existing pattern and is well-documented.


106-113: LGTM!

The conversion method is well-implemented and correctly maps all required fields for backward compatibility.


84-91: Review security implications of undefined provider types.

While the backward compatibility handling is implemented correctly, allowing undefined provider types could potentially be exploited. Consider:

  1. Adding logging when falling back to backward compatibility mode
  2. Setting a deprecation timeline for the old signature verification
  3. Documenting the security implications in the code comments

Let me check if there's logging for security events:

✅ Verification successful

Security measures for undefined provider types are already in place

The codebase already implements proper security measures:

  • Invalid provider types are logged with detailed context in coordinator/internal/logic/auth/login.go
  • Undefined types are automatically converted to internal type during login
  • The hash calculation difference is strictly for backward compatibility
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for security logging patterns
rg -A 5 "log\.(Warn|Error).*security|authentication|verification"

Length of output: 6595


Script:

#!/bin/bash
# Search for logging and ProverProviderType usage in auth-related files
echo "=== Searching auth package for logging ==="
fd "auth\.go$" | xargs rg "log\.|logger"

echo -e "\n=== Searching for ProverProviderType usage ==="
rg -A 3 "ProverProviderType.*Undefined"

echo -e "\n=== Searching for authentication usage patterns ==="
ast-grep --pattern 'func $_($_: $_) {
  $$$
  ProverProviderType
  $$$
}'

Length of output: 1815


43-48: Consider adding validation for ProverProviderType.

While the field is properly tagged and optional, consider adding validation to ensure that when provided, it contains a valid value from the ProverProviderType enum.

Let me check if there's validation elsewhere in the codebase:

colinlyguo
colinlyguo previously approved these changes Jan 14, 2025
coordinator/internal/types/prover.go Outdated Show resolved Hide resolved
coordinator/internal/utils/prover_name.go Outdated Show resolved Hide resolved
coordinator/internal/logic/auth/login.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
coordinator/internal/types/prover.go (1)

47-56: Consider optimizing the String method.

While the implementation is correct, consider these improvements:

  1. Add documentation for the String method
  2. Consider using a predefined map for string values to improve performance
  3. Make the default case message consistent with ProverType.String()
+// String returns a string representation of the ProverProviderType
 func (r ProverProviderType) String() string {
+	var providerTypeStrings = map[ProverProviderType]string{
+		ProverProviderTypeInternal: "prover provider type internal",
+		ProverProviderTypeExternal: "prover provider type external",
+	}
+	if s, ok := providerTypeStrings[r]; ok {
+		return s
+	}
-	switch r {
-	case ProverProviderTypeInternal:
-		return "prover provider type internal"
-	case ProverProviderTypeExternal:
-		return "prover provider type external"
-	default:
-		return fmt.Sprintf("prover provider type: %d", r)
-	}
+	return fmt.Sprintf("illegal prover provider type: %d", r)
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35f22b4 and 1f05c6e.

📒 Files selected for processing (3)
  • coordinator/internal/logic/auth/login.go (1 hunks)
  • coordinator/internal/types/prover.go (1 hunks)
  • coordinator/internal/utils/prover_name.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • coordinator/internal/utils/prover_name.go
  • coordinator/internal/logic/auth/login.go
🔇 Additional comments (3)
coordinator/internal/types/prover.go (3)

44-45: LGTM! Type definition and documentation.

The type definition follows Go conventions and includes appropriate documentation.


58-65: LGTM! Well-structured constants block.

The constants are well-defined, properly documented, and follow a logical progression using iota.


43-65: Verify the usage of ProverProviderType across the codebase.

Let's ensure the new type is properly integrated with other components.

✅ Verification successful

ProverProviderType implementation is properly integrated

The type is correctly used across authentication, task management, and prover logic components with proper backward compatibility handling.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage of ProverProviderType across the codebase

# Check for struct field usage
echo "Checking struct field usage:"
rg -A 2 "ProverProviderType\s+[^(]" --type go

# Check for function parameter usage
echo -e "\nChecking function parameter usage:"
rg -A 2 "func.*ProverProviderType" --type go

# Check for type assertions and conversions
echo -e "\nChecking type assertions and conversions:"
rg -A 2 "\(\w+\s*\.\s*ProverProviderType\)" --type go

Length of output: 6697

colinlyguo
colinlyguo previously approved these changes Jan 14, 2025
@yiweichi yiweichi merged commit a6f2457 into develop Jan 15, 2025
6 checks passed
@yiweichi yiweichi deleted the feat-coordinator-assign-logic branch January 15, 2025 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bump-version Bump the version tag for deployment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants