-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable JetStream on disk errors #6292
base: main
Are you sure you want to change the base?
Conversation
835c8bd
to
a309fcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have better detection of a wider array of filesystem errors like permissions etc, agreed, but I'm not entirely sure why this would be a configurable option and I don't particularly like it either.
The disk being pulled out from underneath JetStream is effectively a catastrophic operational issue and for JS to continue to try to operate in those circumstances feels problematic, i.e. with the potential for data loss that no one may even notice. Even more so in a clustered scenario where the metalayer and Raft logs are being written out to disk.
Thank you for taking the time to review this. |
Signed-off-by: Sourabh Agrawal <[email protected]>
Signed-off-by: Sourabh Agrawal <[email protected]>
Reopening PR, as it was closed mistakenly |
Thank you for the suggestion @neilalexander. I have implemented the recommended change and removed the flag. |
Hello @neilalexander, Happy New Year. |
Currently, there are scenarios where NATS JetStream may encounter permission errors when file system goes into read only mode, which can lead to an inconsistent state. In such cases, the system continues to allow publishing messages by resetting stream state, leading to a misaligned consumer stream sequence.
This PR introduces changes to gracefully handle these permission errors and prevent NATS from continuing in an inconsistent state when:
After this PR, If NATS is running in non-clustered mode, the user will be unable to issue write requests until the issue is resolved. In clustered mode, only the affected node will stop accepting requests, while the system will continue to function as long as the required quorum remains healthy.
PR potentially fixes : #6211 which leads to consumer sequence reaching higher than stream sequence.
Signed-off-by: Sourabh Agrawal [email protected]