Merge pull request #1898 from marko-bekhta/post/search-quarkus-io-rol…

…lover Post: quarkus.search.io series - rolling over
quarkusio · Apr 25, 2024 · cc0c6a1 · cc0c6a1
2 parents 0b9d4f8 + a7f909c
commit cc0c6a1
Show file tree

Hide file tree

Showing 5 changed files with 298 additions and 0 deletions.
diff --git a/_data/authors.yaml b/_data/authors.yaml
@@ -541,3 +541,10 @@ iyanev:
   emailhash: "05c3ef31aeba2d12e2eb26282752e654"
   job_title: "Senior Software Engineer"
   twitter: "iqnev"
+markobekhta:
+  name: "Marko Bekhta"
+  email: "[email protected]"
+  emailhash: "2934f00ba9190bc06cf03fde5b50c61d"
+  job_title: "Engineer (Software)"
+  twitter: "that_java_guy"
+  bio: "Software Engineer at Red Hat and Hibernate team member."
diff --git a/_posts/2024-04-25-search-indexing-rollover.adoc b/_posts/2024-04-25-search-indexing-rollover.adoc
@@ -0,0 +1,291 @@
+---
+layout: post
+title: 'Indexing rollover with Quarkus and Hibernate Search'
+date: 2024-04-25
+tags: hibernate search howto
+synopsis: 'This is the first post in the series that dives into the implementation details of the search.quarkus.io application. Are you interested in near zero-downtime reindexing? Then this one is for you!'
+author: markobekhta
+---
+:hibernate-search-version: 7.0
+:imagesdir: /assets/images/posts/search-indexing-rollover
+:hibernate-search-docs-url: https://docs.jboss.org/hibernate/search/{hibernate-search-version}/reference/en-US/html_single/
+:quarkus-hibernate-search-docs-url: https://quarkus.io/guides/hibernate-search-orm-elasticsearch
+
+This is the first post in the series diving into the implementation details of the
+link:https://github.com/quarkusio/search.quarkus.io[application] backing the guide search of
+link:https://quarkus.io/guides/[quarkus.io].
+
+Does your application need full-text search capabilities? Do you need to keep your application running
+and producing search results without any downtime, even when reindexing all your data?
+Look no further. In this post, we'll cover how you can approach this problem
+and solve it in practice with a few low-level APIs, provided you use Hibernate Search,
+be it link:{quarkus-hibernate-search-docs-url}[on top of Hibernate ORM]
+or https://quarkus.io/version/main/guides/hibernate-search-standalone-elasticsearch[in standalone mode].
+
+The approach suggested in this post is based on the fact that Hibernate Search uses
+link:{hibernate-search-docs-url}#backend-elasticsearch-indexlayout[aliased indexes],
+and communicates with the actual index through a read/write alias, depending on the operation it needs to perform.
+For example, a search operation will be routed to a read index alias,
+while an indexing operation will be sent to a write index alias.
+
+image::initial-app.png[]
+
+NOTE: This approach is implemented and successfully used in our Quarkus application that backs the guides'
+search of https://quarkus.io/guides/[quarkus.io/guides/].
+You can see the complete implementation here:
+link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/Rollover.java#L44-L331[rollover implementation]
+and link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/IndexingService.java#L226-L244[rollover usage].
+
+Applications using Hibernate Search can keep their search indexes up-to-date by updating the index gradually,
+as the data on which the index documents are based is modified, providing a near real-time index synchronisation.
+
+On the other hand, if the search requirements allow for a delay in synchronisation
+or the data is updated only at certain times of day, the option of mass indexing can effectively keep the indexes up-to-date.
+The link:{hibernate-search-docs-url}[Hibernate Search documentation] provides more information about these approaches
+and other Hibernate Search capabilities.
+
+The application discussed in this post is using the mass indexing approach.
+This means that at certain events, e.g., when a new version of the application is deployed or a scheduled time is reached,
+the application has to process the documentation guides and create search index documents from them.
+
+Now, since we want our application to keep  providing results to any search requests while we add/update documents to the indexes,
+we cannot perform a simple reindexing operation
+using a link:{hibernate-search-docs-url}#search-batchindex-massindexer[mass indexer],
+or the recently added link:{quarkus-hibernate-search-docs-url}#management[management endpoint in Quarkus],
+as these would drop all existing documents from the index before indexing them:
+search operations would not be able to match them anymore until reindexing finishes.
+
+Instead, we can create a new index with the same schema and route any write operations to it.
+
+image::write-app.png[]
+
+Since Hibernate Search does not provide the rollover feature out of the box (https://hibernate.atlassian.net/browse/HSEARCH-3499[yet])
+we will need to resort to using the lower-level APIs to access the Elasticsearch client and perform the required operations ourselves.
+To do so, we need to follow a few simple steps:
+
+1. Get the mapping information for the index we want to reindex using the schema manager.
++
+[source, java]
+====
+----
+@Inject
+SearchMapping searchMapping; // <1>
+// ...
+
+searchMapping.scope(MyIndexedEntity.class).schemaManager() // <2>
+    .exportExpectedSchema((backendName, indexName, export) -> { // <3>
+        var createIndexRequestBody = export.extension(ElasticsearchExtension.get())
+                .bodyParts().get(0); // <4>
+        var mappings = createIndexRequestBody.getAsJsonObject("mappings"); // <5>
+        var settings =createIndexRequestBody.getAsJsonObject("settings"); // <6>
+    });
+----
+1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
+2. Get a schema manager for the indexed entity we are interested in (`MyIndexedEntity`).
+If all entities should be targeted, then `Object.class` can be used to create the scope.
+3. Use the export schema API to access the mapping information.
+4. Use the extension to get access to the Elasticsearch-specific `.bodyParts()` method that returns
+a JSON representing the JSON HTTP body needed to create the indexes.
+5. Get the mapping information for the particular index.
+6. Get the settings for the particular index.
+====
++
+2. Get the reference to the Elasticsearch client, so we can perform API calls to the search backend cluster:
++
+[source, java]
+====
+----
+@Inject
+SearchMapping searchMapping; // <1>
+// ...
+RestClient client = searchMapping.backend() // <2>
+    .unwrap(ElasticsearchBackend.class) // <3>
+    .client(RestClient.class); // <4>
+----
+1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
+2. Access the backend from a search mapping instance.
+3. Unwrap the backend to the `ElasticsearchBackend`, so that we can access backend-specific APIs.
+4. Get a reference to the Elasticsearch's rest client.
+====
++
+3. Create a new index using the OpenSearch/Elasticsearch rollover API
+that would allow us to keep using the existing index for read operations,
+while write operations will be sent to the new index:
++
+[source, java]
+====
+----
+@Inject
+SearchMapping searchMapping; // <1>
+// ...
+
+SearchIndexedEntity<?> entity = searchMapping.indexedEntity(MyIndexedEntity.class);
+var index = entity.indexManager().unwrap(ElasticsearchIndexManager.class).descriptor(); // <2>
+
+var request = new Request("POST", "/" + index.writeName() + "/_rollover"); // <3>
+var body = new JsonObject();
+body.add("mappings", mappings);
+body.add("settings", settings);
+body.add("aliases", new JsonObject()); // <4>
+request.setEntity(new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON));
+
+var response = client.performRequest(request); // <5>
+//...
+----
+1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
+2. Get the index descriptor to get the aliases from it.
+3. Start building the rollover request body using the write index alias from the index descriptor.
+4. Note that we are including an empty "aliases" so that the aliases are not copied over to the new index,
+except for the write alias (which is implicitly updated since the rollover request is targeting it directly).
+We don't want the read alias to start pointing to the new index immediately.
+5. Perform the rollover API request using the Elasticsearch REST client obtained in the previous step.
+====
+
+With this successfully completed, indexes are in the state we wanted:
+
+image::write-app.png[]
+
+We can start populating our write index without affecting search requests.
+
+Once we are done with indexing, we can either commit or rollback depending on the results:
+
+image::after-indexing.png[]
+
+Committing the index rollover means that we are happy with the results and ready to switch to the new index
+for both reading and writing operations while removing the old one. To do that, we need to send a request to the cluster:
+
+[source, java]
+====
+----
+var client = ... <1>
+
+var request = new Request("POST", "_aliases"); // <2>
+request.setEntity(new StringEntity("""
+        {
+            "actions": [
+                {
+                    "add": {  // <3>
+                        "index": "%s",
+                        "alias": "%s",
+                        "is_write_index": false
+                    },
+                    "remove_index": {  // <4>
+                        "index": "%s"
+                    }
+                }
+            ]
+        }
+        """.formatted( newIndexName, readAliasName, oldIndexName ) // <5>
+    , ContentType.APPLICATION_JSON));
+
+var response = client.performRequest(request); // <5>
+//...
+----
+1. Get access to the Elasticsearch REST client as described above.
+2. Start creating an `_aliases` API request.
+3. Add an action to update the index aliases to use the new index for both read and write operations.
+Here, we must make the read alias point to the new index.
+4. Add an action to remove the old index.
+5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request,
+while the aliases can be retrieved from the index descriptor.
+====
+
+Otherwise, if we have encountered an error or decided for any other reason to stop the rollover, we can roll back to using
+the initial index:
+
+[source, java]
+====
+----
+var client = ... <1>
+
+var request = new Request("POST", "_aliases"); // <2>
+request.setEntity(new StringEntity("""
+        {
+            "actions": [
+                {
+                    "add": {  // <3>
+                        "index": "%s",
+                        "alias": "%s",
+                        "is_write_index": true
+                    },
+                    "remove_index": {  // <4>
+                        "index": "%s"
+                    }
+                }
+            ]
+        }
+        """.formatted( oldIndexName, writeAliasName, newIndexName ) // <5>
+    , ContentType.APPLICATION_JSON));
+
+var response = client.performRequest(request); // <5>
+//...
+----
+1. Get access to the Elasticsearch REST client as described above.
+2. Start creating an `_aliases` API request.
+3. Add an action to update the index aliases to use the old index for both read and write operations.
+Here, we must make the write alias point back to the old index.
+4. Add an action to remove the new index.
+5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request,
+while the aliases can be retrieved from the index descriptor.
+====
+
+NOTE: Keep in mind that in case of a rollback, your initial index may be out of sync if any write operations were performed
+while the write alias was pointing to the new index.
+
+With this knowledge, we can organize the rollover process as follows:
+[source, java]
+====
+----
+try (Rollover rollover = Rollover.start(searchMapping)) {
+    // Perform the indexing operations ...
+    rollover.commit();
+}
+----
+====
+
+Where the `Rollover` class will look as follows:
+
+[source, java]
+====
+----
+class Rollover implements Closeable {
+    public static Rollover start(SearchMapping searchMapping) {
+        // initiate the rollover process by sending the _rollover request ...
+        // ...
+        return new Rollover( client, rolloverResponse );  // <1>
+    }
+
+    @Override
+    public void close() {
+        if ( !done ) { // <2>
+            rollback();
+        }
+    }
+
+    public void commit() {
+        // send the `_aliases` request to switch to the *new* index
+        // ...
+        done = true;
+    }
+
+    public void rollback() {
+        // send the `_aliases` request to switch to the *old* index
+        // ...
+        done = true;
+    }
+}
+----
+1. Keep the reference to the Elasticsearch REST client to perform API calls.
+2. If we haven't successfully committed the rollover, it'll be rolled back on close.
+====
+
+Once again, for a complete working example of this rollover implementation, check out the
+link:https://github.com/quarkusio/search.quarkus.io[search.quarkus.io on GitHub].
+
+If you find this feature useful and would like to have it built-in into your Hibernate Search and Quarkus apps
+feel free to reach out to us on the https://hibernate.atlassian.net/browse/HSEARCH-3499[pending feature requests]
+to discuss your ideas and suggestions.
+
+Stay tuned for more details in the coming weeks as we publish more blog posts
+diving into other interesting implementation aspects of this application.
+Happy searching and rolling over!
diff --git a/assets/images/posts/search-indexing-rollover/after-indexing.png b/assets/images/posts/search-indexing-rollover/after-indexing.png
diff --git a/assets/images/posts/search-indexing-rollover/initial-app.png b/assets/images/posts/search-indexing-rollover/initial-app.png
diff --git a/assets/images/posts/search-indexing-rollover/write-app.png b/assets/images/posts/search-indexing-rollover/write-app.png