Skip to content

Commit

Permalink
Merge pull request #1898 from marko-bekhta/post/search-quarkus-io-rol…
Browse files Browse the repository at this point in the history
…lover

Post: quarkus.search.io series - rolling over
  • Loading branch information
yrodiere authored Apr 25, 2024
2 parents 0b9d4f8 + a7f909c commit cc0c6a1
Show file tree
Hide file tree
Showing 5 changed files with 298 additions and 0 deletions.
7 changes: 7 additions & 0 deletions _data/authors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -541,3 +541,10 @@ iyanev:
emailhash: "05c3ef31aeba2d12e2eb26282752e654"
job_title: "Senior Software Engineer"
twitter: "iqnev"
markobekhta:
name: "Marko Bekhta"
email: "[email protected]"
emailhash: "2934f00ba9190bc06cf03fde5b50c61d"
job_title: "Engineer (Software)"
twitter: "that_java_guy"
bio: "Software Engineer at Red Hat and Hibernate team member."
291 changes: 291 additions & 0 deletions _posts/2024-04-25-search-indexing-rollover.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,291 @@
---
layout: post
title: 'Indexing rollover with Quarkus and Hibernate Search'
date: 2024-04-25
tags: hibernate search howto
synopsis: 'This is the first post in the series that dives into the implementation details of the search.quarkus.io application. Are you interested in near zero-downtime reindexing? Then this one is for you!'
author: markobekhta
---
:hibernate-search-version: 7.0
:imagesdir: /assets/images/posts/search-indexing-rollover
:hibernate-search-docs-url: https://docs.jboss.org/hibernate/search/{hibernate-search-version}/reference/en-US/html_single/
:quarkus-hibernate-search-docs-url: https://quarkus.io/guides/hibernate-search-orm-elasticsearch

This is the first post in the series diving into the implementation details of the
link:https://github.com/quarkusio/search.quarkus.io[application] backing the guide search of
link:https://quarkus.io/guides/[quarkus.io].

Does your application need full-text search capabilities? Do you need to keep your application running
and producing search results without any downtime, even when reindexing all your data?
Look no further. In this post, we'll cover how you can approach this problem
and solve it in practice with a few low-level APIs, provided you use Hibernate Search,
be it link:{quarkus-hibernate-search-docs-url}[on top of Hibernate ORM]
or https://quarkus.io/version/main/guides/hibernate-search-standalone-elasticsearch[in standalone mode].

The approach suggested in this post is based on the fact that Hibernate Search uses
link:{hibernate-search-docs-url}#backend-elasticsearch-indexlayout[aliased indexes],
and communicates with the actual index through a read/write alias, depending on the operation it needs to perform.
For example, a search operation will be routed to a read index alias,
while an indexing operation will be sent to a write index alias.

image::initial-app.png[]

NOTE: This approach is implemented and successfully used in our Quarkus application that backs the guides'
search of https://quarkus.io/guides/[quarkus.io/guides/].
You can see the complete implementation here:
link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/Rollover.java#L44-L331[rollover implementation]
and link:https://github.com/quarkusio/search.quarkus.io/blob/d956b6a1341d8693fa1d6b7881f3840f48bdaacd/src/main/java/io/quarkus/search/app/indexing/IndexingService.java#L226-L244[rollover usage].

Applications using Hibernate Search can keep their search indexes up-to-date by updating the index gradually,
as the data on which the index documents are based is modified, providing a near real-time index synchronisation.

On the other hand, if the search requirements allow for a delay in synchronisation
or the data is updated only at certain times of day, the option of mass indexing can effectively keep the indexes up-to-date.
The link:{hibernate-search-docs-url}[Hibernate Search documentation] provides more information about these approaches
and other Hibernate Search capabilities.

The application discussed in this post is using the mass indexing approach.
This means that at certain events, e.g., when a new version of the application is deployed or a scheduled time is reached,
the application has to process the documentation guides and create search index documents from them.

Now, since we want our application to keep providing results to any search requests while we add/update documents to the indexes,
we cannot perform a simple reindexing operation
using a link:{hibernate-search-docs-url}#search-batchindex-massindexer[mass indexer],
or the recently added link:{quarkus-hibernate-search-docs-url}#management[management endpoint in Quarkus],
as these would drop all existing documents from the index before indexing them:
search operations would not be able to match them anymore until reindexing finishes.

Instead, we can create a new index with the same schema and route any write operations to it.

image::write-app.png[]

Since Hibernate Search does not provide the rollover feature out of the box (https://hibernate.atlassian.net/browse/HSEARCH-3499[yet])
we will need to resort to using the lower-level APIs to access the Elasticsearch client and perform the required operations ourselves.
To do so, we need to follow a few simple steps:

1. Get the mapping information for the index we want to reindex using the schema manager.
+
[source, java]
====
----
@Inject
SearchMapping searchMapping; // <1>
// ...
searchMapping.scope(MyIndexedEntity.class).schemaManager() // <2>
.exportExpectedSchema((backendName, indexName, export) -> { // <3>
var createIndexRequestBody = export.extension(ElasticsearchExtension.get())
.bodyParts().get(0); // <4>
var mappings = createIndexRequestBody.getAsJsonObject("mappings"); // <5>
var settings =createIndexRequestBody.getAsJsonObject("settings"); // <6>
});
----
1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
2. Get a schema manager for the indexed entity we are interested in (`MyIndexedEntity`).
If all entities should be targeted, then `Object.class` can be used to create the scope.
3. Use the export schema API to access the mapping information.
4. Use the extension to get access to the Elasticsearch-specific `.bodyParts()` method that returns
a JSON representing the JSON HTTP body needed to create the indexes.
5. Get the mapping information for the particular index.
6. Get the settings for the particular index.
====
+
2. Get the reference to the Elasticsearch client, so we can perform API calls to the search backend cluster:
+
[source, java]
====
----
@Inject
SearchMapping searchMapping; // <1>
// ...
RestClient client = searchMapping.backend() // <2>
.unwrap(ElasticsearchBackend.class) // <3>
.client(RestClient.class); // <4>
----
1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
2. Access the backend from a search mapping instance.
3. Unwrap the backend to the `ElasticsearchBackend`, so that we can access backend-specific APIs.
4. Get a reference to the Elasticsearch's rest client.
====
+
3. Create a new index using the OpenSearch/Elasticsearch rollover API
that would allow us to keep using the existing index for read operations,
while write operations will be sent to the new index:
+
[source, java]
====
----
@Inject
SearchMapping searchMapping; // <1>
// ...
SearchIndexedEntity<?> entity = searchMapping.indexedEntity(MyIndexedEntity.class);
var index = entity.indexManager().unwrap(ElasticsearchIndexManager.class).descriptor(); // <2>
var request = new Request("POST", "/" + index.writeName() + "/_rollover"); // <3>
var body = new JsonObject();
body.add("mappings", mappings);
body.add("settings", settings);
body.add("aliases", new JsonObject()); // <4>
request.setEntity(new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON));
var response = client.performRequest(request); // <5>
//...
----
1. Inject `SearchMapping` somewhere in your app so that we can use it to access a schema manager.
2. Get the index descriptor to get the aliases from it.
3. Start building the rollover request body using the write index alias from the index descriptor.
4. Note that we are including an empty "aliases" so that the aliases are not copied over to the new index,
except for the write alias (which is implicitly updated since the rollover request is targeting it directly).
We don't want the read alias to start pointing to the new index immediately.
5. Perform the rollover API request using the Elasticsearch REST client obtained in the previous step.
====

With this successfully completed, indexes are in the state we wanted:

image::write-app.png[]

We can start populating our write index without affecting search requests.

Once we are done with indexing, we can either commit or rollback depending on the results:

image::after-indexing.png[]

Committing the index rollover means that we are happy with the results and ready to switch to the new index
for both reading and writing operations while removing the old one. To do that, we need to send a request to the cluster:

[source, java]
====
----
var client = ... <1>
var request = new Request("POST", "_aliases"); // <2>
request.setEntity(new StringEntity("""
{
"actions": [
{
"add": { // <3>
"index": "%s",
"alias": "%s",
"is_write_index": false
},
"remove_index": { // <4>
"index": "%s"
}
}
]
}
""".formatted( newIndexName, readAliasName, oldIndexName ) // <5>
, ContentType.APPLICATION_JSON));
var response = client.performRequest(request); // <5>
//...
----
1. Get access to the Elasticsearch REST client as described above.
2. Start creating an `_aliases` API request.
3. Add an action to update the index aliases to use the new index for both read and write operations.
Here, we must make the read alias point to the new index.
4. Add an action to remove the old index.
5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request,
while the aliases can be retrieved from the index descriptor.
====

Otherwise, if we have encountered an error or decided for any other reason to stop the rollover, we can roll back to using
the initial index:

[source, java]
====
----
var client = ... <1>
var request = new Request("POST", "_aliases"); // <2>
request.setEntity(new StringEntity("""
{
"actions": [
{
"add": { // <3>
"index": "%s",
"alias": "%s",
"is_write_index": true
},
"remove_index": { // <4>
"index": "%s"
}
}
]
}
""".formatted( oldIndexName, writeAliasName, newIndexName ) // <5>
, ContentType.APPLICATION_JSON));
var response = client.performRequest(request); // <5>
//...
----
1. Get access to the Elasticsearch REST client as described above.
2. Start creating an `_aliases` API request.
3. Add an action to update the index aliases to use the old index for both read and write operations.
Here, we must make the write alias point back to the old index.
4. Add an action to remove the new index.
5. The names of the new/old index can be retrieved from the response of the initial `_rollover` API request,
while the aliases can be retrieved from the index descriptor.
====

NOTE: Keep in mind that in case of a rollback, your initial index may be out of sync if any write operations were performed
while the write alias was pointing to the new index.

With this knowledge, we can organize the rollover process as follows:
[source, java]
====
----
try (Rollover rollover = Rollover.start(searchMapping)) {
// Perform the indexing operations ...
rollover.commit();
}
----
====

Where the `Rollover` class will look as follows:

[source, java]
====
----
class Rollover implements Closeable {
public static Rollover start(SearchMapping searchMapping) {
// initiate the rollover process by sending the _rollover request ...
// ...
return new Rollover( client, rolloverResponse ); // <1>
}
@Override
public void close() {
if ( !done ) { // <2>
rollback();
}
}
public void commit() {
// send the `_aliases` request to switch to the *new* index
// ...
done = true;
}
public void rollback() {
// send the `_aliases` request to switch to the *old* index
// ...
done = true;
}
}
----
1. Keep the reference to the Elasticsearch REST client to perform API calls.
2. If we haven't successfully committed the rollover, it'll be rolled back on close.
====

Once again, for a complete working example of this rollover implementation, check out the
link:https://github.com/quarkusio/search.quarkus.io[search.quarkus.io on GitHub].

If you find this feature useful and would like to have it built-in into your Hibernate Search and Quarkus apps
feel free to reach out to us on the https://hibernate.atlassian.net/browse/HSEARCH-3499[pending feature requests]
to discuss your ideas and suggestions.

Stay tuned for more details in the coming weeks as we publish more blog posts
diving into other interesting implementation aspects of this application.
Happy searching and rolling over!
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit cc0c6a1

Please sign in to comment.