-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17306: fix replication problem on follower restart #2873
SOLR-17306: fix replication problem on follower restart #2873
Conversation
I am hoping someone with more confidence in this area weighs in, but if you don't get a review, please do poke me!!! |
Could #1875 be augmented to test your scenario? |
@epugh First, we tried using BATS for testing (as you already mentioned in the JIRA Ticket), but it was easier to create the unit test attached to the Pull Request. We built this because our setup is Leader → Replication Leader → Followers. We have been running this patch in our production environment for two months without any issues on Solr 9.6. Should we also submit a Pull Request against the main branch? |
|
BTW, it would be interesting if you could write a short few paragraphs type story of how you do your lifecycle with leader/folllower/replicas, your ASG etc. In https://github.com/apache/solr/pull/2783/files#diff-b58818a370dac65f7abb0064599f8813a56841b4a40f960ea2b81e398b820f43 we are talking about the architecture you highlight. Would be great if you could review the pros/cons and weigh in. Let me know if you are interested and we can discuss more on that PR... |
This is the Pull Request against main branch |
Our setup is Mentioned already in that Pull request with leader -> repeater -> follower. Just the Followers are Autoscaled. We do not use any SolrCloud functionality. Just simple leader/follower Setup. The Repeater is the Leader that is always Up for Autoscaling and Leader is Locked for Replication when updating the catalog. Maybe at some time we will change our update strategy and do not have the need to lock the leader. The Problem why we created that PR was: Leader was Locked, Repeater got a System Patch and was restartet automatically. then the data was removed |
Closing in favour of the main branch. Will dig into probably early next week unless someone beats me too it! |
https://issues.apache.org/jira/browse/SOLR-17306
Description
If Leader has Replication disabled - do not delete Followers data on restart
Solution
Check if Leader Replication is enabled
Tests
Implemented Unit Tests, that check different restart scenarios. Enable Directory Storage for Replica, othwerwise tests will not work because memory is cleaned on restart
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.