diff --git a/_data/authors.yaml b/_data/authors.yaml index 8ce043f6dd8..c38ca8eaec4 100644 --- a/_data/authors.yaml +++ b/_data/authors.yaml @@ -513,4 +513,11 @@ yrodiere: emailhash: "2a8bdd4ffd282b7185c74b52ab452617" job_title: "Principal Software Engineer" twitter: "yoannrodiere" - bio: "Lead developer on Hibernate Search (http://hibernate.org/search/), and one of the main contributors to the Hibernate extensions (ORM, Search, Validator) of Quarkus." \ No newline at end of file + bio: "Lead developer on Hibernate Search (http://hibernate.org/search/), and one of the main contributors to the Hibernate extensions (ORM, Search, Validator) of Quarkus." +markobekhta: + name: "Marko Bekhta" + email: "marko@hibernate.org" + emailhash: "2934f00ba9190bc06cf03fde5b50c61d" + job_title: "Engineer (Software)" + twitter: "that_java_guy" + bio: "Software Engineer at Red Hat and Hibernate team member." \ No newline at end of file diff --git a/_posts/2024-02-25-search-indexing-rollover.adoc b/_posts/2024-02-25-search-indexing-rollover.adoc new file mode 100644 index 00000000000..b7930f8a50c --- /dev/null +++ b/_posts/2024-02-25-search-indexing-rollover.adoc @@ -0,0 +1,269 @@ +--- +layout: post +title: 'Indexing rollover with Quarkus and Hibernate Search' +date: 2024-02-25 +tags: hibernate search howto +synopsis: 'This is the first post in the series that dives into the implementation details of a search.quarkus.io application. Are you interested in near zero-downtime reindexing? Then this one is for you!' +author: markobekhta +--- + +:imagesdir: /assets/images/posts/search-indexing-rollover +:hibernate-search-docs-url: https://docs.jboss.org/hibernate/search/{hibernate-search-version}/reference/en-US/html_single/ +:quarkus-hibernate-search-docs-url: https://quarkus.io/guides/hibernate-search-orm-elasticsearch + +This is the first post in the series diving into the implementation details of an +link:https://github.com/quarkusio/search.quarkus.io[application] backing the guide search of +link:https://quarkus.io/guides/[quarkus.io]. + +Does your application have full-text search capabilities and use Hibernate Search? +Do you need to perform reindexing of your data while keeping your application running and producing search results? +Look no further. In this post, we'll cover how you can approach this problem +and solve it in practice with a few low-level APIs provided by Hibernate Search. + +The approach suggested in this post is based on the fact that Hibernate Search is using +link:{hibernate-search-docs-url}#backend-elasticsearch-indexlayout[aliased indexes], +and communicates with the actual index through a read/write alias, depending on the operation it needs to perform. +For example, a search operation will be routed through a read index alias, +while the indexing operation will be sent through a write index alias. + +image::initial-app.png[] + +NOTE: This approach is implemented and successfully used in our Quarkus application that backs the guides' search of quarkus.io/guides. +You can see the complete implementation here: +link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/Rollover.java#L44-L331[rollover implementation] +and link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/IndexingService.java#L226-L244[rollover usage]. + +Now, since we would want to keep our application providing results to any search operations and add/update documents to the indexes +we cannot perform a simple reindex operation (purge all documents within an index and mass-index them back) +using a link:{hibernate-search-docs-url}#search-batchindex-massindexer[mass indexer], +or a recently added link:{quarkus-hibernate-search-docs-url}#management[management Quarkus endpoint], +as this will drop all existing documents from the index, and search operations will not be able to match them anymore. + +Instead, we can create a new index with the same schema and route any write operations to it. + +image::write-app.png[] + +Since, at the moment, Hibernate Search does not provide the rollover feature out of the box +we will need to resort to using the lower-level APIs to access the Elasticsearch client and perform the required operations ourselves. +To do so, we need to follow a few simple steps: + +1. Get the mapping information for the index we want to reindex using the schema manager. ++ +[source, java] +==== +---- +@Inject +SearchMapping searchMapping; // <1> +// ... + +searchMapping.scope(MyIndexedEntity.class).schemaManager() // <2> + .exportExpectedSchema((backendName, indexName, export) -> { // <3> + var createIndexRequestBody = export.extension(ElasticsearchExtension.get()).bodyParts().get(0); // <4> + var mappings = createIndexRequestBody.getAsJsonObject("mappings"); // <5> + var settings =createIndexRequestBody.getAsJsonObject("settings"); // <6> +}); +---- +1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager. +2. Get a schema manager for the indexed entity we are interested in (`MyIndexedEntity`). +If all entities should be targeted, then `Object.class` can be used to create the scope. +3. Use the export schema API to access the mapping information. +4. Use the extension to get access to the Elasticsearch-specific `.bodyParts()` method that returns +a JSON representing the JSON HTTP body needed to create the indexes. +5. Get the mapping information for the particular index. +6. Get the settings for the particular index. +==== ++ +2. Get the reference to the Elasticsearch client, so we can perform API calls to the search backend cluster: ++ +[source, java] +==== +---- +@Inject +SearchMapping searchMapping; // <1> +// ... +RestClient client = searchMapping.backend() // <2> + .unwrap(ElasticsearchBackend.class) // <3> + .client(RestClient.class); // <4> +}); +---- +1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager. +2. Access the backend from a search mapping instance. +3. Unwrap the backend to the `ElasticsearchBackend`, so that we can access backend-specific APIs. +4. Get a reference to the Elasticsearch's rest client. +==== ++ +3. Create a new index using the OpenSearch/Elasticsearch rollover API +that would allow us to keep using the existing index for read operations, +while write operations will be sent to the new index: ++ +[source, java] +==== +---- +@Inject +SearchMapping searchMapping; // <1> +// ... + +SearchIndexedEntity entity = searchMapping.indexedEntity(MyIndexedEntity.class); +var index = entity.indexManager().unwrap(ElasticsearchIndexManager.class).descriptor(); // <2> + +var request = new Request("POST", "/" + index.writeName() + "/_rollover"); // <3> +var body = new JsonObject(); +body.add("mappings", mappings); +body.add("settings", settings); +body.add("aliases", new JsonObject()); // <4> +request.setEntity(new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON)); + +var response = client.performRequest(request); // <5> +//... +---- +1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager. +2. Get the index descriptor to get the aliases from it. +3. Start building the rollover request body using the write index alias from the index descriptor. +4. Note that we are including an empty "aliases" so that the aliases are not copied over to the new index, +except for the write alias. +We don't want the read alias to start pointing to the new index immediately. +5. Perform the rollover API request using the Elasticsearch REST client obtained in the previous step. +==== + +With this successfully completed, we can start populating our write index. Once we are done with indexing, +we can either commit or rollback depending on the results: + +image::after-indexing.png[] + +Committing the index rollover means that we are happy with the results and ready to switch to the new index +for both reading and writing operations while removing the old one. To do that, we need to send a request to the cluster: + +[source, java] +==== +---- +var client = ... <1> + +var request = new Request("POST", "_aliases"); // <2> +request.setEntity(new StringEntity(""" + { + "actions": [ + { + "add": { // <3> + "index": "%s", + "alias": "%s", + "is_write_index": false + }, + "remove_index": { // <4> + "index": "%s" + } + } + ] + } + """.formatted( newIndexName, readAliasName, oldIndexName ) // <5> + , ContentType.APPLICATION_JSON)); + +var response = client.performRequest(request); // <5> +//... +---- +1. Get access to the Elasticsearch REST client as described above. +2. Start creating an `_aliases` API request. +3. Add an action to update the index aliases to use the new index for both read and write operations. +Here, we must make the read alias point to the new index. +4. Add an action to remove the old index. +5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request, +while the aliases can be retrieved from the index descriptor. +==== + +Otherwise, if we have encountered an error or decided for any other reason to stop the rollover, we can roll back to using +the initial index: + +[source, java] +==== +---- +var client = ... <1> + +var request = new Request("POST", "_aliases"); // <2> +request.setEntity(new StringEntity(""" + { + "actions": [ + { + "add": { // <3> + "index": "%s", + "alias": "%s", + "is_write_index": true + }, + "remove_index": { // <4> + "index": "%s" + } + } + ] + } + """.formatted( oldIndexName, writeAliasName, newIndexName ) // <5> + , ContentType.APPLICATION_JSON)); + +var response = client.performRequest(request); // <5> +//... +---- +1. Get access to the Elasticsearch REST client as described above. +2. Start creating an `_aliases` API request. +3. Add an action to update the index aliases to use the old index for both read and write operations. +Here, we must make the write alias point back to the old index. +4. Add an action to remove the new index. +5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request, +while the aliases can be retrieved from the index descriptor. +==== + +NOTE: Keep in mind that in case of a rollback, your initial index may be out of sync if any write operations were performed +while the write alias was pointing to the new index. + +With this knowledge, we can organize the rollover process as follows: +[source, java] +==== +---- +try (Rollover rollover = Rollover.start(searchMapping)) { + // Perform the indexing operations ... + rollover.commit(); +} +---- +==== + +Where the `Rollover` class will look as follows: + +[source, java] +==== +---- +class Rollover implements Closeable { + public static Rollover start(SearchMapping searchMapping) { + // initiate the rollover process by sending the _rollover request ... + // ... + return new Rollover( client, rolloverResponse ); // <1> + } + + @Override + public void close() { + if ( !done ) { // <2> + rollback(); + } + } + + public void commit() { + // send the `_aliases` request to switch to the *new* index + // ... + done = true; + } + + public void rollback() { + // send the `_aliases` request to switch to the *old* index + // ... + done = true; + } +} +---- +1. Keep the reference to the Elasticsearch REST client to perform API calls. +2. If we haven't successfully committed the rollover, it'll be rolled back on close. +==== + +Once again, for a complete working example of this rollover implementation, check out the +link:https://github.com/quarkusio/search.quarkus.io[search.quarkus.io on GitHub]. + +If you find this feature useful and would like to have it built-in into your Hibernate Search and Quarkus apps +feel free to reach out to us, submit feature requests and discuss your ideas and suggestions. + +Stay tuned for more details in the coming weeks as we publish more blog posts +diving into other interesting implementation aspects of this application. +Happy searching and rolling over! diff --git a/assets/images/posts/search-indexing-rollover/after-indexing.png b/assets/images/posts/search-indexing-rollover/after-indexing.png new file mode 100644 index 00000000000..3448e09b478 Binary files /dev/null and b/assets/images/posts/search-indexing-rollover/after-indexing.png differ diff --git a/assets/images/posts/search-indexing-rollover/initial-app.png b/assets/images/posts/search-indexing-rollover/initial-app.png new file mode 100644 index 00000000000..24fcc628651 Binary files /dev/null and b/assets/images/posts/search-indexing-rollover/initial-app.png differ diff --git a/assets/images/posts/search-indexing-rollover/write-app.png b/assets/images/posts/search-indexing-rollover/write-app.png new file mode 100644 index 00000000000..5fa8582fd6a Binary files /dev/null and b/assets/images/posts/search-indexing-rollover/write-app.png differ