Skip to content

Commit

Permalink
Post: quarkus.search.io series - rolling over
Browse files Browse the repository at this point in the history
  • Loading branch information
marko-bekhta committed Feb 26, 2024
1 parent 538bdd8 commit 1a30f0c
Show file tree
Hide file tree
Showing 5 changed files with 277 additions and 1 deletion.
9 changes: 8 additions & 1 deletion _data/authors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -513,4 +513,11 @@ yrodiere:
emailhash: "2a8bdd4ffd282b7185c74b52ab452617"
job_title: "Principal Software Engineer"
twitter: "yoannrodiere"
bio: "Lead developer on Hibernate Search (http://hibernate.org/search/), and one of the main contributors to the Hibernate extensions (ORM, Search, Validator) of Quarkus."
bio: "Lead developer on Hibernate Search (http://hibernate.org/search/), and one of the main contributors to the Hibernate extensions (ORM, Search, Validator) of Quarkus."
markobekhta:
name: "Marko Bekhta"
email: "[email protected]"
emailhash: "2934f00ba9190bc06cf03fde5b50c61d"
job_title: "Engineer (Software)"
twitter: "that_java_guy"
bio: "Software Engineer at Red Hat and Hibernate team member."
269 changes: 269 additions & 0 deletions _posts/2024-02-25-search-indexing-rollover.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
---
layout: post
title: 'Indexing rollover with Quarkus and Hibernate Search'
date: 2024-02-25
tags: hibernate search howto
synopsis: 'This is the first post in the series that dives into the implementation details of a search.quarkus.io application. Are you interested in near zero-downtime reindexing? Then this one is for you!'
author: markobekhta
---

:imagesdir: /assets/images/posts/search-indexing-rollover
:hibernate-search-docs-url: https://docs.jboss.org/hibernate/search/{hibernate-search-version}/reference/en-US/html_single/
:quarkus-hibernate-search-docs-url: https://quarkus.io/guides/hibernate-search-orm-elasticsearch

This is the first post in the series diving into the implementation details of an
link:https://github.com/quarkusio/search.quarkus.io[application] backing the guide search of
link:https://quarkus.io/guides/[quarkus.io].

Does your application have full-text search capabilities and use Hibernate Search?
Do you need to perform reindexing of your data while keeping your application running and producing search results?
Look no further. In this post, we'll cover how you can approach this problem
and solve it in practice with a few low-level APIs provided by Hibernate Search.

The approach suggested in this post is based on the fact that Hibernate Search is using
link:{hibernate-search-docs-url}#backend-elasticsearch-indexlayout[aliased indexes],
and communicates with the actual index through a read/write alias, depending on the operation it needs to perform.
For example, a search operation will be routed through a read index alias,
while the indexing operation will be sent through a write index alias.

image::initial-app.png[]

NOTE: This approach is implemented and successfully used in our Quarkus application that backs the guides' search of quarkus.io/guides.
You can see the complete implementation here:
link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/Rollover.java#L44-L331[rollover implementation]
and link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/IndexingService.java#L226-L244[rollover usage].

Now, since we would want to keep our application providing results to any search operations and add/update documents to the indexes
we cannot perform a simple reindex operation (purge all documents within an index and mass-index them back)
using a link:{hibernate-search-docs-url}#search-batchindex-massindexer[mass indexer],
or a recently added link:{quarkus-hibernate-search-docs-url}#management[management Quarkus endpoint],
as this will drop all existing documents from the index, and search operations will not be able to match them anymore.

Instead, we can create a new index with the same schema and route any write operations to it.

image::write-app.png[]

Since, at the moment, Hibernate Search does not provide the rollover feature out of the box
we will need to resort to using the lower-level APIs to access the Elasticsearch client and perform the required operations ourselves.
To do so, we need to follow a few simple steps:

1. Get the mapping information for the index we want to reindex using the schema manager.
+
[source, java]
====
----
@Inject
SearchMapping searchMapping; // <1>
// ...
searchMapping.scope(MyIndexedEntity.class).schemaManager() // <2>
.exportExpectedSchema((backendName, indexName, export) -> { // <3>
var createIndexRequestBody = export.extension(ElasticsearchExtension.get()).bodyParts().get(0); // <4>
var mappings = createIndexRequestBody.getAsJsonObject("mappings"); // <5>
var settings =createIndexRequestBody.getAsJsonObject("settings"); // <6>
});
----
1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
2. Get a schema manager for the indexed entity we are interested in (`MyIndexedEntity`).
If all entities should be targeted, then `Object.class` can be used to create the scope.
3. Use the export schema API to access the mapping information.
4. Use the extension to get access to the Elasticsearch-specific `.bodyParts()` method that returns
a JSON representing the JSON HTTP body needed to create the indexes.
5. Get the mapping information for the particular index.
6. Get the settings for the particular index.
====
+
2. Get the reference to the Elasticsearch client, so we can perform API calls to the search backend cluster:
+
[source, java]
====
----
@Inject
SearchMapping searchMapping; // <1>
// ...
RestClient client = searchMapping.backend() // <2>
.unwrap(ElasticsearchBackend.class) // <3>
.client(RestClient.class); // <4>
});
----
1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
2. Access the backend from a search mapping instance.
3. Unwrap the backend to the `ElasticsearchBackend`, so that we can access backend-specific APIs.
4. Get a reference to the Elasticsearch's rest client.
====
+
3. Create a new index using the OpenSearch/Elasticsearch rollover API
that would allow us to keep using the existing index for read operations,
while write operations will be sent to the new index:
+
[source, java]
====
----
@Inject
SearchMapping searchMapping; // <1>
// ...
SearchIndexedEntity<?> entity = searchMapping.indexedEntity(MyIndexedEntity.class);
var index = entity.indexManager().unwrap(ElasticsearchIndexManager.class).descriptor(); // <2>
var request = new Request("POST", "/" + index.writeName() + "/_rollover"); // <3>
var body = new JsonObject();
body.add("mappings", mappings);
body.add("settings", settings);
body.add("aliases", new JsonObject()); // <4>
request.setEntity(new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON));
var response = client.performRequest(request); // <5>
//...
----
1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
2. Get the index descriptor to get the aliases from it.
3. Start building the rollover request body using the write index alias from the index descriptor.
4. Note that we are including an empty "aliases" so that the aliases are not copied over to the new index,
except for the write alias.
We don't want the read alias to start pointing to the new index immediately.
5. Perform the rollover API request using the Elasticsearch REST client obtained in the previous step.
====

With this successfully completed, we can start populating our write index. Once we are done with indexing,
we can either commit or rollback depending on the results:

image::after-indexing.png[]

Committing the index rollover means that we are happy with the results and ready to switch to the new index
for both reading and writing operations while removing the old one. To do that, we need to send a request to the cluster:

[source, java]
====
----
var client = ... <1>
var request = new Request("POST", "_aliases"); // <2>
request.setEntity(new StringEntity("""
{
"actions": [
{
"add": { // <3>
"index": "%s",
"alias": "%s",
"is_write_index": false
},
"remove_index": { // <4>
"index": "%s"
}
}
]
}
""".formatted( newIndexName, readAliasName, oldIndexName ) // <5>
, ContentType.APPLICATION_JSON));
var response = client.performRequest(request); // <5>
//...
----
1. Get access to the Elasticsearch REST client as described above.
2. Start creating an `_aliases` API request.
3. Add an action to update the index aliases to use the new index for both read and write operations.
Here, we must make the read alias point to the new index.
4. Add an action to remove the old index.
5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request,
while the aliases can be retrieved from the index descriptor.
====

Otherwise, if we have encountered an error or decided for any other reason to stop the rollover, we can roll back to using
the initial index:

[source, java]
====
----
var client = ... <1>
var request = new Request("POST", "_aliases"); // <2>
request.setEntity(new StringEntity("""
{
"actions": [
{
"add": { // <3>
"index": "%s",
"alias": "%s",
"is_write_index": true
},
"remove_index": { // <4>
"index": "%s"
}
}
]
}
""".formatted( oldIndexName, writeAliasName, newIndexName ) // <5>
, ContentType.APPLICATION_JSON));
var response = client.performRequest(request); // <5>
//...
----
1. Get access to the Elasticsearch REST client as described above.
2. Start creating an `_aliases` API request.
3. Add an action to update the index aliases to use the old index for both read and write operations.
Here, we must make the write alias point back to the old index.
4. Add an action to remove the new index.
5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request,
while the aliases can be retrieved from the index descriptor.
====

NOTE: Keep in mind that in case of a rollback, your initial index may be out of sync if any write operations were performed
while the write alias was pointing to the new index.

With this knowledge, we can organize the rollover process as follows:
[source, java]
====
----
try (Rollover rollover = Rollover.start(searchMapping)) {
// Perform the indexing operations ...
rollover.commit();
}
----
====

Where the `Rollover` class will look as follows:

[source, java]
====
----
class Rollover implements Closeable {
public static Rollover start(SearchMapping searchMapping) {
// initiate the rollover process by sending the _rollover request ...
// ...
return new Rollover( client, rolloverResponse ); // <1>
}
@Override
public void close() {
if ( !done ) { // <2>
rollback();
}
}
public void commit() {
// send the `_aliases` request to switch to the *new* index
// ...
done = true;
}
public void rollback() {
// send the `_aliases` request to switch to the *old* index
// ...
done = true;
}
}
----
1. Keep the reference to the Elasticsearch REST client to perform API calls.
2. If we haven't successfully committed the rollover, it'll be rolled back on close.
====

Once again, for a complete working example of this rollover implementation, check out the
link:https://github.com/quarkusio/search.quarkus.io[search.quarkus.io on GitHub].

If you find this feature useful and would like to have it built-in into your Hibernate Search and Quarkus apps
feel free to reach out to us, submit feature requests and discuss your ideas and suggestions.

Stay tuned for more details in the coming weeks as we publish more blog posts
diving into other interesting implementation aspects of this application.
Happy searching and rolling over!
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1a30f0c

Please sign in to comment.