-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC : Repository Registration for Remote Backed Storage #8623
Comments
Can there be a race condition where a system index creation happens before the What would be the behavior if this happens? Will the index creation fail since "cluster.remote_store.repository" is already set? |
Thanks @psychbot for the proposal! Please replace "master" with "cluster manager" in the diagrams.
On Solution 2:
Can you explain exactly at what point repository will be registered in terms of operation on leader. Before first election, during election or post election. Which
Can you add more details on how this node trimming would happen? Are you going to make the trimming logic pluggable? This also means if the cluster manager node gets updated to latest version before data nodes and becomes leader, then old data nodes can't join the cluster. This could potentially mean that shard will not be migrated to new nodes. So, this will be a breaking change.
Exactly at what point during upgrade, the repo will be registered?
Can you explain this Pro more as active cluster manager needs to have repository else how will it validate the joins and register the repo? |
Repository isn't tied to a node, it can be registered to the cluster(don't prefer the dynamic nature of repositories)but the remote-backed node joining the cluster should ensure that the repository(backing store) it refers to is already validated and registered as an explicit hard dependency or registers it during the process by the first The leader would need to ensure the remote backed nodes have homogeneous and validated repository configurations across those nodes. The node join validation part will check if there is a repo already registered, it matches that of current joining node, if not the joining node as a part of a new Since repo registration itself requires a cluster state update, care needs to be taken to register the repository. We need to evaluate if repo registration and leader elected state publication can be bundled together. The caveat is failure to register the repo could register in leader failure, but also ensures no other request like index creation can supersede a repo registration task There would be no trimming logic as such @psychbot please correct my understanding, the join validation will fail to join the cluster it thinks is not suited for the configuration. |
No, as per the preferred approach above we should not see race condition as the repository registration will happen when the cluster manager is elected and first node with required repository information in node attributes sends a node join request to the cluster manager. |
|
Thanks @shwetathareja
Index creation is not the event we want to hook repository registration since we need to support migration to remote backed indices which should start as long as we have remote backed nodes and none of the allocation constraints have been breached. To differentiate between mixed clusters and migration cases, plan is to have a cluster level settings to auto-migrate indices to remote backed node and vice versa(direction of upgrade/downgrade). By default auto-upgrade to remote backed node would be enabled unless there is an explicit, index level setting to keep replication mode to be docrep. Repository registration is not the sole purpose of join validator, it should also restrict a non-remote backed node from joining once we have upgraded all indices and don't plan to downgrade or run in mixed mode(heterogeneous setup) |
Thanks for the details @Bukhtawar . Also, once the repository is registered, in case a node joins with different repo (for the sake of discussion), what problem can it cause? Basically, the node is going to use the repo which is registered in cluster state and the repo in the node attribute is going to be ignored anyway. Btw, in which case a different repo would be set in node attributes besides a configuration error. Repository registration can be triggered via first index creation or when first index is migrate to remote backed index. Essentially, the first time a remote backed index is encountered, it always ensure to register the repo first before proceeding further. |
@psychbot thanks for the detailed proposal. have couple of basic comments/doubts:
|
Adding to what @Bukhtawar said, repository information being a node level attribute ensures that a node which is joining the cluster has same repository(refereed in the cluster level settings/cluster state) and its a hard dependency for the repository to be present. Also,
The repository registration will happen only after the election and when first node join request lands onto the cluster manager node.
If during upgrade when the cluster manager changes all node will send join request to new cluster manager then the node join validator will get executed and in that scenario we wont need this trimming logic. I thought earlier that there wont be any join request sent to the new cluster manager and hence added this trimming logic part, will remove.
This is applicable when on a 2.10 cluster we will enabling remote store, at that point we will require to create new set of data nodes which will join the cluster and cluster manager will register the repository when the first remote store node from new set of data nodes will send a join request with all the required attributes.
In Second approach - While enabling remote store in 2.10 cluster, the repository will get registered when a remote-store node from new set of nodes will send a join request with all the attributes to the current active cluster manager. |
@harishbhakuni21 We should not and are not allowing updation of repository information in any case as that can lead to catastrophic outcomes. Once the repository is registered and passed onto the cluster level settings it should remain same throughout the cluster lifecycle. |
@sachinpkale @psychbot |
Problem Statement
OpenSearch with remote backed storage enables storing indexed data to remote data store which guarantees data durability. As of today the user has to register the repository manually by calling PUT /_snapshot/remote-repository and update either the cluster level remote repository settings or index level remote repository settings or both in order to use the remote backed storage feature.
Cluster Settings for Remote Repository -
IndexSettings for Remote Repository -
Once the user updates these settings then only the indexed data will be backed to remote store which essentially means any index created before this process will not be backed to remote store until we have #7986 built in OpenSearch which allows migrating older indices to remote store.
Due to this manual process in between we will miss on backing up system indices to remote store as all the system indices gets created during the cluster bootstrap.
Requirements
Functional
Non-Functional
Assumptions
Background
OpenSearch has a plugin based architecture which allows developers to build plugins using the interfaces provided by the core and run them as part of the OpenSearch engine. Some of the plugins create system indices and stores information necessary for their functioning during cluster bootstrap.
Remote backed storage in its current state can’t back these system indices which are created during cluster bootstrap and hence we want to support the registration of repositories during cluster bootstrap via yml and register the repositories at the very starting of cluster bootstrap.
[Solution 1] Cluster Settings based approach
In this solution we will be passing the repository information in Opensearch yml and during the cluster bootstrap the active cluster manager will register the repository.
Algorithm
The solution will have the following steps
Below is the format how repository information and cluster settings will be supplied via yml
Their are two ways to achieve this -
a. [Preferred] Cluster State Change Event - Listening to cluster state change event and when the cluster manager is elected the task for registering the repository will be submitted. The ClusterStateListener implementation will be removed once the repository is registered.
b. Background Thread - A background thread which will keep polling local cluster state periodically and once the the cluster manager is elected the executor will stop.
Failure Scenarios
Migration/Upgrade Scenarios
All the nodes which supports remote backed storage will have a node attribute lets say remote_backed_storage. Below are some of the scenarios -
Pros
Cons
[Preferred][Solution 2] Node Attribute based approach
In this solution we will pass the information via OpenSearch yml and during the node bootstrap the repository information will be added to the node attributes and during the node join the node attributes will be passed to active cluster manager to register the repository and to perform validation.
Algorithm
Registering the repository - We want the repository registration to happen instantly when the cluster is formed/forming. When a node tries to join the cluster it will send the repository information to the active cluster manager, the cluster manager will validate the repository information against the repository information in its node attributes and register the same if it matches otherwise reject the node join request.
Registration task should be submitted by one node - In order to achieve this the repository registration logic will be functional only on the active cluster manager. The node joining will send the repository information in node attributes to active cluster manager and it will validate the information to register the repository if not already registered for all the subsequent node join request if the repository is registered the registration logic will be No-Op.
Failure Scenarios
Migration/Upgrade Scenarios
Below are some of the scenarios -
Pros
[Solution 3] Extended Node Attribute based approach
This approach is similar to second approach instead of storing node attributes in the form of key value pair of string to string it will be stored in a string to json serialized object. The other node reading the node attribute will have to deserialize the object to get the information present against the set attribute.
Below is the high level idea of how the information will be stored -
Pros
Cons
FAQ
Will be adding retries on the repositories which were not able to register successfully the first time. If there is consistent failure we will let the cluster changed event kick in and handle the flow again.
Appendix
Migration/Upgrade Scenario
The text was updated successfully, but these errors were encountered: