From fe7672fa93f02369b74c024675b5be4c75070043 Mon Sep 17 00:00:00 2001 From: Quentin Coumes Date: Fri, 22 Mar 2024 09:24:30 +0100 Subject: [PATCH] doc: Add documentation for autosync and related models (#7, #45) --- docs/document.md | 67 +++++++++++++++++++++++++++++++++++++++++++----- docs/settings.md | 23 ++++++++++++++--- 2 files changed, 79 insertions(+), 11 deletions(-) diff --git a/docs/document.md b/docs/document.md index 0863cbe..c1f6f01 100644 --- a/docs/document.md +++ b/docs/document.md @@ -10,8 +10,16 @@ The `Django` subclass contains parameters related to Django's side of the docume into this list. See [Document Field Reference](fields.md) for how to manually define fields. * `queryset_pagination` (*optional*) - Size of the chunk when indexing, override [`OPENSEARCH_DSL_QUERYSET_PAGINATION`](settings.md#opensearch_dsl_queryset_pagination). -* `related_models` (*optional*) - List of related Django models. Specifies a relation between models that allows for - index updating based on these defined relationships. +* `ignore_signals` (*optional*) - If set to `True`, this Document will be ignored by the [auto-syncing](#autosync) + feature. Default to `False`. +* `related_models` (*optional*) - List of related Django models. Any change made to models in this list will trigger a + re-indexation of the related instances of the model associated with this Document. See [auto-syncing](#autosync) for + more information. +* `auto_refresh` (*optional*) - Whether to refresh the affected shards after performing the indexing operations. Default + is `False`. `True` makes the changes show up in search results immediately, but hurts cluster performance. + `"wait_for"` waits for a refresh. Requests take longer to return, but cluster performance doesn’t suffer. This + per-Document setting overrides [OPENSEARCH_DSL_AUTO_REFRESH](settings.md#opensearch_dsl_auto_refresh). + ```python class Country(models.Model): @@ -26,12 +34,10 @@ class CountryDocument(Document): class Django: model = Country queryset_pagination = 128 - fields = [ - 'name', - 'area', - 'population', - ] + fields = ['name', 'area', 'population'] related_models = [] + ignore_signals = True + auto_refresh = True id = fields.LongField() continent = fields.ObjectField(properties={ @@ -217,3 +223,50 @@ mapped with probably the wrong type. To apply changes done to your `Document` class to an existing OpenSearch index, you must call the [`manage.py opensearch index update `](management.md#index-subcommand) command. + + +## Autosync + +This package provide an auto-syncing feature. It will automatically update the index when a model is +created / saved / deleted. + +You can also update the index of related models by defining [`Document.Django.related_models`](document.md#django-subclass) +and the method `get_instances_from_related(self, related)`. This method take an instance of a related model and +should return the instances corresponding to the Document. For example, if you have a `Country` model with a `Continent` +and multiple `City`, you can define the following Document: + +```python +class ContinentInnerDocument(InnerDoc): + id = fields.LongField() + name = fields.KeywordField() + code = fields.KeywordField() + +class CityInnerDocument(InnerDoc): + id = fields.LongField() + name = fields.KeywordField() + postal_cod = fields.KeywordField() + +@registry.register_document +class CountryDocument(Document): + class Django: + model = Country + fields = ['name', 'area', 'population'] + related_models = [Continent, City] + + id = fields.LongField() + continent = fields.ObjectField(doc_class=ContinentInnerDocument) + cities = fields.NestedField(doc_class=CityInnerDocument) + + def get_instances_from_related(self, related: Continent | City) -> List[Country]: + if isinstance(related, Continent): + return related.countries.all() + return [related.country] # isinstance(related, City) +``` + + +Since this feature uses Django's signals, it has the same limitation and this feature only works +when the `save()` or `delete()` methods of your models are called. It does not work with most +bulk operation such as `queryset.bulk_create()`, `queryset.update()`, `queryset.delete()`... + +It is important to note that the autosync feature can have a significant impact on performance, especially used in +conjunction with related models. diff --git a/docs/settings.md b/docs/settings.md index 40e9a7c..5917c58 100644 --- a/docs/settings.md +++ b/docs/settings.md @@ -46,19 +46,34 @@ do not play well with this option. Default: `4096` -Size of the chunk used when indexing data. Can be overriden by setting `queryset_pagination` inside `Document`' +Size of the chunk used when indexing data. Can be overridden by setting `queryset_pagination` inside `Document`' s [`Django` subclass](document.md). + +## `OPENSEARCH_DSL_AUTOSYNC` + +Default: `True` + +Set to `False` to globally disable auto-syncing. + +See [Autosync](document.md#autosync) for more information. + +The autosync operations can be customized using [`OPENSEARCH_DSL_SIGNAL_PROCESSOR`](settings.md#opensearch_dsl_signal_processor) +setting. + ## `OPENSEARCH_DSL_SIGNAL_PROCESSOR` +Default: `django_opensearch_dsl.signals.RealTimeSignalProcessor`. + This (optional) setting controls what SignalProcessor class is used to handle Django’s signals and -keep the indices up-to-date. Default to `django_opensearch_dsl.signals.RealTimeSignalProcessor`. +keep the indices up-to-date. While some builtin choices are provided, you can also define your own +by subclassing `django_opensearch_dsl.signals.BaseSignalProcessor`. -Valid choices are: +Builtin choices are: * `django_opensearch_dsl.signals.RealTimeSignalProcessor` -Operation are processed synchronously as soon as the signal is emitted. +Operations are processed synchronously as soon as the signal is emitted. * `django_opensearch_dsl.signals.CelerySignalProcessor`