-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locking mechanism for index sets during datanode remote reindex migration #20222
Conversation
graylog2-server/src/test/java/org/graylog2/periodical/TestableRotationStrategy.java
Outdated
Show resolved
Hide resolved
graylog2-server/src/main/java/org/graylog2/periodical/IndexRotationThread.java
Outdated
Show resolved
Hide resolved
ecccadc
to
db02256
Compare
...og2-server/src/main/java/org/graylog2/indexer/datanode/DatanodeMigrationLockServiceImpl.java
Outdated
Show resolved
Hide resolved
graylog2-server/src/test/java/org/graylog2/indexer/datanode/MigrationLockServiceTest.java
Outdated
Show resolved
Hide resolved
...og2-server/src/main/java/org/graylog2/indexer/datanode/DatanodeMigrationLockServiceImpl.java
Outdated
Show resolved
Hide resolved
...og2-server/src/main/java/org/graylog2/indexer/datanode/DatanodeMigrationLockServiceImpl.java
Outdated
Show resolved
Hide resolved
...og2-server/src/main/java/org/graylog2/indexer/datanode/DatanodeMigrationLockServiceImpl.java
Outdated
Show resolved
Hide resolved
import java.util.concurrent.TimeUnit; | ||
import java.util.stream.Collectors; | ||
|
||
public class DatanodeMigrationLockServiceImpl implements DatanodeMigrationLockService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not fully understand the code, so please forgive me this silly question, but how do we handle multi-node setup?
Can each GL node have its own migration lock service?
If yes, is it important that activeLocks
collection contains all the locks?
If yes, how do we get locks created by another GL nodes appear in that collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that activeLocks
collection is used just for extending the locks?
So maybe me previous questions are not important at all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each GL node will have its own migration lock service. The locks are persisted in the mongodb, so they are distributed and block parallel usage of the index set across the whole cluster. The activeLocks collection is there only to keep the locally requested locks from expiring. Otherwise they'll automatically disappear from the mongo collection, as they have defined a TTL. The local activeLocks
collection doesn't need to be distributed, each node is refreshing its own locks. In the moment the node disappears, these lock disappear as well, which is a good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if you can use the RefreshingLockService
to refresh the locks instead of implementing your own refresh system. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with RefreshingLockService is that it can hold one lock only. I'd need one instance of the service for each lock. Instead of implementing own refresh system I'd need to implement a system for managing service instances, which seems even worse :-)
...arch2/src/main/java/org/graylog/storage/opensearch2/RemoteReindexingMigrationAdapterOS2.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Looks complex as well. I have no idea how to simplify it, however.
I'd only suggest considering a few javadoc comments here and there, taken from the PR description and our discussions, so that no one has to ask himself the same questions we have been asking.
Thank you for a lot of hard work you have put into this one!
This PR adds proper locking mechanism for locking of index sets during remote reindex migration. These locks skip index rotation and retention if the migration is running. In the same time they force the migration to wait for finish of any currently running index set maintenance tasks before the actual data migration can start.
Description
The newly introduced DatanodeMigrationLockService, building on top of the existing distributed LockService is creating exclusive locks for index sets. The service allows two types of work with the lock - wait for a lock and block the thread or try to run a code block if the lock is free to take. The blocking mechanism is used for the migration itself. It will wait till the background maintenance is done. The nonblocking try approach is applied in the maintenance tasks. Their execution will be skipped for an index set if this is occupied by the migration.
Motivation and Context
Fixes https://github.com/Graylog2/graylog-plugin-enterprise/issues/8281
How Has This Been Tested?
Added unit tests, existing integration tests
Screenshots
Types of changes
Checklist: