Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add update metadata step for rollbacking downgraded region #2812

Merged

Conversation

WenyXu
Copy link
Member

@WenyXu WenyXu commented Nov 24, 2023

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

Add update metadata step for rolling back downgraded region.
It will roll back the downgraded leader region if the candidate region is unreachable.

Behaviors:
Abort(non-retry):

  • TableRoute is not found.

Retry:

  • Failed to update TableRouteValue.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.

Refer to a related PR or issue link (optional)

#2700

@WenyXu WenyXu self-assigned this Nov 24, 2023
@WenyXu WenyXu mentioned this pull request Nov 28, 2023
39 tasks
@WenyXu WenyXu force-pushed the feat/update-metadata-rollbacking branch from 76d83a4 to 71d44ce Compare November 28, 2023 07:52
@WenyXu WenyXu force-pushed the feat/update-metadata-rollbacking branch from 71d44ce to 6896f2d Compare November 29, 2023 11:45
@WenyXu WenyXu marked this pull request as ready for review November 29, 2023 11:45
@WenyXu WenyXu force-pushed the feat/update-metadata-rollbacking branch from 6896f2d to e716d67 Compare November 29, 2023 11:49
@MichaelScofield
Copy link
Collaborator

Can we add a "rollback" method in trait Procedure? If the procedure runs into unretriable error, the procedure framework executes the "rollback" way. @evenyag

@WenyXu
Copy link
Member Author

WenyXu commented Nov 30, 2023

Can we add a "rollback" method in trait Procedure? If the procedure runs into unretriable error, the procedure framework executes the "rollback" way. @evenyag

BTW, Currently, we avoid this problem by carefully designing the procedure steps. It ensures all non-retriable errors eventually do not affect the system.

@evenyag
Copy link
Contributor

evenyag commented Nov 30, 2023

Can we add a "rollback" method in trait Procedure?

Yes, the runner already has a rollback_procedure() method, you can call procedure's rollback here.

async fn rollback_procedure(&mut self) -> Result<()> {
self.store
.rollback_procedure(self.meta.id, self.step)
.await
.map_err(|e| {
logging::error!(
e; "Failed to write rollback key for procedure {}-{}",
self.procedure.type_name(),
self.meta.id
);
e
})?;
self.step += 1;
Ok(())
}

However, fully supporting rollback is not quite easy. rollback() should be idempotent and the runner should retry rollback() until it returns success. The procedure implementation itself should perform correct operations to roll back its side effects. The procedure framework should keep the lock the procedure holds.

We should consider when and how to mark a procedure as rollbacked. We should run rollback() again while recovering a procedure and find that the procedure is rolling back.

  • We might need to persist a file (.rolling or something else) before invoking rollback()
  • We need another file to mark the procedure is finished via a .rollback file
  • Another way is to store the rollback status in the .rollback file.

Currently, we avoid this problem by carefully designing the procedure steps. It ensures all non-retriable errors eventually do not affect the system.

Currently, this might be the simplest way. We can support rollback in the future, which still requires a lot of effort.

@WenyXu
Copy link
Member Author

WenyXu commented Nov 30, 2023

@MichaelScofield @fengjiachun PTAL

Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

codecov bot commented Dec 1, 2023

Codecov Report

Merging #2812 (0dd8df7) into develop (ae81535) will decrease coverage by 0.38%.
Report is 6 commits behind head on develop.
The diff coverage is 90.00%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2812      +/-   ##
===========================================
- Coverage    84.79%   84.42%   -0.38%     
===========================================
  Files          737      740       +3     
  Lines       115840   115980     +140     
===========================================
- Hits         98224    97911     -313     
- Misses       17616    18069     +453     

@MichaelScofield MichaelScofield added this pull request to the merge queue Dec 1, 2023
Merged via the queue into GreptimeTeam:develop with commit 781f242 Dec 1, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

4 participants