Skip to content

Commit

Permalink
feat: replace Elasticsearch by Meilisearch
Browse files Browse the repository at this point in the history
With this change, we get rid of Elasticsearch across all of Tutor.
Instead, we run Meilisearch, which is much more lightweight in terms of
memory usage. Obviously, this is a (very) breaking change. Indexing
commands will be run during init, such that search should work as
before.

After the edx-search PR is merged and the dependency is upgraded in
edx-platform, we should remove the manual `RUN pip install ...` command.
  • Loading branch information
regisb committed Oct 30, 2024
1 parent 2cbe2f2 commit 032e632
Show file tree
Hide file tree
Showing 31 changed files with 164 additions and 90 deletions.
1 change: 1 addition & 0 deletions changelog.d/20241017_125209_regis_meilisearch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- 💥[Feature] Replace Elasticsearch by Meilisearch. Elasticsearch was both a source of complexity and high resource usage. With this change, we no longer run Elasticsearch to perform common search queries across Open edX. This includes: course discovery, courseware search and studio search. Instead, we index all these documents in a Meilisearch instance, which is much more lightweight in terms of memory consumption. (by @regisb)
24 changes: 15 additions & 9 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ With an up-to-date environment, Tutor is ready to launch an Open edX platform an
Individual service activation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- ``RUN_ELASTICSEARCH`` (default: ``true``)
- ``RUN_MEILISEARCH`` (default: ``true``)
- ``RUN_MONGODB`` (default: ``true``)
- ``RUN_MYSQL`` (default: ``true``)
- ``RUN_REDIS`` (default: ``true``)
Expand Down Expand Up @@ -71,9 +71,9 @@ This configuration parameter defines the name of the Docker image to run the dev

This configuration parameter defines which Caddy Docker image to use.

- ``DOCKER_IMAGE_ELASTICSEARCH`` (default: ``"docker.io/elasticsearch:7.17.9"``)
- ``DOCKER_IMAGE_MEILISEARCH`` (default: ``"docker.io/getmeili/meilisearch:v1.8.4"``)

This configuration parameter defines which Elasticsearch Docker image to use.
This configuration parameter defines which Meilisearch Docker image to use.

- ``DOCKER_IMAGE_MONGODB`` (default: ``"docker.io/mongo:7.0.7"``)

Expand Down Expand Up @@ -228,13 +228,19 @@ By default, a running Open edX platform deployed with Tutor includes all necessa
.. note::
When configuring an external MySQL database, please make sure it is using version 8.4.

Elasticsearch
*************
Meilisearch
***********

- ``MEILISEARCH_URL`` (default: ``"http://meilisearch:7700"``): internal URL used for backend-to-backend communication.
- ``MEILISEARCH_PUBLIC_URL`` (default: ``"{% if ENABLE_HTTPS %}https{% else %}http{% endif %}://meilisearch.{{ LMS_HOST }}"``): external URL from which the frontend will access the Meilisearch instance.
- ``MEILISEARCH_INDEX_PREFIX`` (default: ``"tutor_"``)
- ``MEILISEARCH_MASTER_KEY`` (default: ``"{{ 24|random_string }}"``)
- ``MEILISEARCH_API_KEY_UID`` (default: ``"{{ 4|uuid }}"``): UID used to sign the API key.
- ``MEILISEARCH_API_KEY`` (default: ``"{{ MEILISEARCH_MASTER_KEY|uid_master_hash(MEILISEARCH_API_KEY_UID) }}"``)

To reset the Meilisearch API key, make sure to unset both the API key and it's UID:

- ``ELASTICSEARCH_SCHEME`` (default: ``"http"``)
- ``ELASTICSEARCH_HOST`` (default: ``"elasticsearch"``)
- ``ELASTICSEARCH_PORT`` (default: ``9200``)
- ``ELASTICSEARCH_HEAP_SIZE`` (default: ``"1g"``)
tutor config save --unset MEILISEARCH_API_KEY_UID MEILISEARCH_API_KEY

MongoDB
*******
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/nightly.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ When running Tutor Nightly, you usually do not want to override your existing Tu
Making changes to Tutor Nightly
-------------------------------

In general pull requests should be open on the "master" branch of Tutor: the "master" branch is automatically merged on the "nightly" branch at every commit, such that changes made to Tutor releases find their way to Tutor Nightly as soon as they are merged. However, sometimes you want to make changes to Tutor Nightly exclusively, and not to the Tutor releases. This might be the case for instance when upgrading the running version of a third-party service (for instance: Elasticsearch, MySQL), or when the master branch requires specific changes. In that case, you should follow the instructions from the :ref:`contributing` section of the docs, with the following differences:
In general pull requests should be open on the "master" branch of Tutor: the "master" branch is automatically merged on the "nightly" branch at every commit, such that changes made to Tutor releases find their way to Tutor Nightly as soon as they are merged. However, sometimes you want to make changes to Tutor Nightly exclusively, and not to the Tutor releases. This might be the case for instance when upgrading the running version of a third-party service (for instance: Meilisearch, MySQL), or when the master branch requires specific changes. In that case, you should follow the instructions from the :ref:`contributing` section of the docs, with the following differences:

- Open your pull request on top of the "nightly" branch instead of "master".
- Add a description of your changes by creating a changelog entry with `make changelog-entry`, as in the master branch.
2 changes: 1 addition & 1 deletion docs/tutorials/scale.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,11 @@ Offloading data storage

Aside from web workers, the most resource-intensive services are in the data persistence layer. They are, by decreasing resource usage:

- `Elasticsearch <https://www.elastic.co/elasticsearch/>`__: indexing of course contents and forum topics, mostly for search. Elasticsearch is never a source of truth in Open edX, and the data can thus be trashed and re-created safely.
- `MySQL <https://www.mysql.com>`__: structured, consistent data storage which is the default destination of all data.
- `MongoDB <https://www.mongodb.com>`__: structured storage of course data.
- `Redis <https://redis.io/>`__: caching and asynchronous task management.
- `MinIO <https://min.io>`__: S3-like object storage for user-uploaded files, which is enabled by the `tutor-minio <https://github.com/overhangio/tutor-minio>`__ plugin. It is possible to replace MinIO by direct filesystem storage (the default), but scaling will then become much more difficult down the road.
- `Meilisearch <https://www.meilisearch.com>`__: indexing of course contents and forum topics, mostly for search. Meilisearch is never a source of truth in Open edX, and the data can thus be trashed and re-created safely.

When attempting to scale a single-server deployment, we recommend starting by offloading some of these stateful data storage components, in the same order of priority. There are multiple benefits:

Expand Down
2 changes: 1 addition & 1 deletion tutor/commands/images.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def _add_images_to_pull(
"""
vendor_images = [
("caddy", "DOCKER_IMAGE_CADDY"),
("elasticsearch", "DOCKER_IMAGE_ELASTICSEARCH"),
("meilisearch", "DOCKER_IMAGE_MEILISEARCH"),
("mongodb", "DOCKER_IMAGE_MONGODB"),
("mysql", "DOCKER_IMAGE_MYSQL"),
("redis", "DOCKER_IMAGE_REDIS"),
Expand Down
4 changes: 4 additions & 0 deletions tutor/commands/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ def _add_core_init_tasks() -> None:
hooks.Filters.CLI_DO_INIT_TASKS.add_item(
("mysql", env.read_core_template_file("jobs", "init", "mysql.sh"))
)
with hooks.Contexts.app("meilisearch").enter():
hooks.Filters.CLI_DO_INIT_TASKS.add_item(
("lms", env.read_core_template_file("jobs", "init", "meilisearch.sh"))
)
with hooks.Contexts.app("lms").enter():
hooks.Filters.CLI_DO_INIT_TASKS.add_item(
(
Expand Down
2 changes: 1 addition & 1 deletion tutor/commands/k8s.py
Original file line number Diff line number Diff line change
Expand Up @@ -390,7 +390,7 @@ def _start_base_deployments(_job_name: str, *_args: Any, **_kwargs: Any) -> None
"""
config = tutor_config.load(context.root)
wait_for_deployment_ready(config, "caddy")
for name in ["elasticsearch", "mysql", "mongodb"]:
for name in ["meilisearch", "mysql", "mongodb"]:
if tutor_config.is_service_activated(config, name):
wait_for_deployment_ready(config, name)

Expand Down
1 change: 0 additions & 1 deletion tutor/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,6 @@ def upgrade_obsolete(config: Config) -> None:
for name in [
"ACTIVATE_LMS",
"ACTIVATE_CMS",
"ACTIVATE_ELASTICSEARCH",
"ACTIVATE_MONGODB",
"ACTIVATE_MYSQL",
"ACTIVATE_REDIS",
Expand Down
2 changes: 2 additions & 0 deletions tutor/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ def _prepare_environment() -> None:
("reverse_host", utils.reverse_host),
("rsa_import_key", utils.rsa_import_key),
("rsa_private_key", utils.rsa_private_key),
("uuid", utils.uuid),
("uid_master_hash", utils.uid_master_hash),
],
)
# Template variables
Expand Down
12 changes: 12 additions & 0 deletions tutor/plugins/openedx.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations


import os
import re
import typing as t
Expand Down Expand Up @@ -28,6 +29,17 @@ def _edx_platform_public_hosts(
return hosts


@hooks.Filters.APP_PUBLIC_HOSTS.add()
def _meilisearch_public_hosts(
hosts: list[str], context_name: t.Literal["local", "dev"]
) -> list[str]:
if context_name == "dev":
hosts.append("{{ MEILISEARCH_PUBLIC_URL.split('://')[1] }}:7700")
else:
hosts.append("{{ MEILISEARCH_PUBLIC_URL.split('://')[1] }}")
return hosts


@hooks.Filters.IMAGES_BUILD_MOUNTS.add()
def _mount_edx_platform_build(
volumes: list[tuple[str, str]], path: str
Expand Down
6 changes: 6 additions & 0 deletions tutor/templates/apps/caddy/Caddyfile
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,10 @@
}
}

{% if RUN_MEILISEARCH %}
{{ MEILISEARCH_PUBLIC_URL.split("://")[1] }}{$default_site_port} {
import proxy "meilisearch:7700"
}
{% endif %}

{{ patch("caddyfile") }}
1 change: 0 additions & 1 deletion tutor/templates/apps/openedx/config/cms.env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ FEATURES:
{{ patch("cms-env-features")|indent(2) }}
CERTIFICATES_HTML_VIEW: true
PREVIEW_LMS_BASE: "{{ PREVIEW_LMS_HOST }}"
ENABLE_COURSEWARE_INDEX: true
ENABLE_CSMH_EXTENDED: false
ENABLE_LEARNER_RECORDS: false
ENABLE_LIBRARY_INDEX: true
Expand Down
3 changes: 0 additions & 3 deletions tutor/templates/apps/openedx/config/lms.env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,7 @@ FEATURES:
{{ patch("lms-env-features")|indent(2) }}
CERTIFICATES_HTML_VIEW: true
PREVIEW_LMS_BASE: "{{ PREVIEW_LMS_HOST }}"
ENABLE_COURSE_DISCOVERY: true
ENABLE_COURSEWARE_SEARCH: true
ENABLE_CSMH_EXTENDED: false
ENABLE_DASHBOARD_SEARCH: true
ENABLE_COMBINED_LOGIN_REGISTRATION: true
ENABLE_GRADE_DOWNLOADS: true
ENABLE_LEARNER_RECORDS: false
Expand Down
6 changes: 4 additions & 2 deletions tutor/templates/apps/openedx/settings/cms/development.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,22 @@
import os
from cms.envs.devstack import *

{% include "apps/openedx/settings/partials/common_cms.py" %}

LMS_BASE = "{{ LMS_HOST }}:8000"
LMS_ROOT_URL = "http://" + LMS_BASE

CMS_BASE = "{{ CMS_HOST }}:8001"
CMS_ROOT_URL = "http://" + CMS_BASE

MEILISEARCH_PUBLIC_URL = "{{ MEILISEARCH_PUBLIC_URL }}:7700"

# Authentication
SOCIAL_AUTH_EDX_OAUTH2_KEY = "{{ CMS_OAUTH2_KEY_SSO_DEV }}"
SOCIAL_AUTH_EDX_OAUTH2_PUBLIC_URL_ROOT = LMS_ROOT_URL

FEATURES["PREVIEW_LMS_BASE"] = "{{ PREVIEW_LMS_HOST }}:8000"

{% include "apps/openedx/settings/partials/common_cms.py" %}

# Setup correct webpack configuration file for development
WEBPACK_CONFIG_PATH = "webpack.dev.config.js"

Expand Down
2 changes: 2 additions & 0 deletions tutor/templates/apps/openedx/settings/lms/development.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
CMS_ROOT_URL = "http://{}".format(CMS_BASE)
LOGIN_REDIRECT_WHITELIST.append(CMS_BASE)

MEILISEARCH_PUBLIC_URL = "{{ MEILISEARCH_PUBLIC_URL }}:7700"

# Session cookie
SESSION_COOKIE_DOMAIN = "{{ LMS_HOST }}"
SESSION_COOKIE_SECURE = False
Expand Down
14 changes: 8 additions & 6 deletions tutor/templates/apps/openedx/settings/partials/common_all.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,14 @@
# Behave like memcache when it comes to connection errors
DJANGO_REDIS_IGNORE_EXCEPTIONS = True

# Elasticsearch connection parameters
ELASTIC_SEARCH_CONFIG = [{
{% if ELASTICSEARCH_SCHEME == "https" %}"use_ssl": True,{% endif %}
"host": "{{ ELASTICSEARCH_HOST }}",
"port": {{ ELASTICSEARCH_PORT }},
}]
# Meilisearch connection parameters
MEILISEARCH_ENABLED = True
MEILISEARCH_URL = "{{ MEILISEARCH_URL }}"
MEILISEARCH_PUBLIC_URL = "{{ MEILISEARCH_PUBLIC_URL }}"
MEILISEARCH_INDEX_PREFIX = "{{ MEILISEARCH_INDEX_PREFIX }}"
MEILISEARCH_API_KEY = "{{ MEILISEARCH_API_KEY }}"
MEILISEARCH_MASTER_KEY = "{{ MEILISEARCH_MASTER_KEY }}"
SEARCH_ENGINE = "search.meilisearch.MeilisearchEngine"

# Common cache config
CACHES = {
Expand Down
3 changes: 3 additions & 0 deletions tutor/templates/apps/openedx/settings/partials/common_cms.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@
FRONTEND_LOGIN_URL = LMS_ROOT_URL + '/login'
FRONTEND_REGISTER_URL = LMS_ROOT_URL + '/register'

# Enable "reindex" button
FEATURES["ENABLE_COURSEWARE_INDEX"] = True

# Create folders if necessary
for folder in [LOG_DIR, MEDIA_ROOT, STATIC_ROOT, ORA2_FILEUPLOAD_ROOT]:
if not os.path.exists(folder):
Expand Down
5 changes: 5 additions & 0 deletions tutor/templates/apps/openedx/settings/partials/common_lms.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@
"LOCATION": "staticfiles_lms",
}

# Enable search features
FEATURES["ENABLE_COURSE_DISCOVERY"] = True
FEATURES["ENABLE_COURSEWARE_SEARCH"] = True
FEATURES["ENABLE_DASHBOARD_SEARCH"] = True

# Create folders if necessary
for folder in [DATA_DIR, LOG_DIR, MEDIA_ROOT, STATIC_ROOT, ORA2_FILEUPLOAD_ROOT]:
if not os.path.exists(folder):
Expand Down
2 changes: 1 addition & 1 deletion tutor/templates/apps/permissions/setowners.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#! /bin/sh
setowner $OPENEDX_USER_ID /mounts/lms /mounts/cms /mounts/openedx
{% if RUN_ELASTICSEARCH %}setowner 1000 /mounts/elasticsearch{% endif %}
{% if RUN_MEILISEARCH %}setowner 1000 /mounts/meilisearch{% endif %}
{% if RUN_MONGODB %}setowner 999 /mounts/mongodb{% endif %}
{% if RUN_MYSQL %}setowner 999 /mounts/mysql{% endif %}
{% if RUN_REDIS %}setowner 1000 /mounts/redis{% endif %}
Expand Down
3 changes: 3 additions & 0 deletions tutor/templates/config/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
CMS_OAUTH2_SECRET: "{{ 24|random_string }}"
ID: "{{ 24|random_string }}"
JWT_RSA_PRIVATE_KEY: "{{ 2048|rsa_private_key }}"
MEILISEARCH_MASTER_KEY: "{{ 24|random_string }}"
MEILISEARCH_API_KEY_UID: "{{ 4|uuid }}"
MEILISEARCH_API_KEY: "{{ MEILISEARCH_MASTER_KEY|uid_master_hash(MEILISEARCH_API_KEY_UID) }}"
MYSQL_ROOT_PASSWORD: "{{ 8|random_string }}"
OPENEDX_MYSQL_PASSWORD: "{{ 8|random_string }}"
OPENEDX_SECRET_KEY: "{{ 24|random_string }}"
Expand Down
13 changes: 6 additions & 7 deletions tutor/templates/config/defaults.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ DOCKER_IMAGE_OPENEDX: "{{ DOCKER_REGISTRY }}overhangio/openedx:{{ TUTOR_VERSION
DOCKER_IMAGE_OPENEDX_DEV: "openedx-dev:{{ TUTOR_VERSION }}"
# https://hub.docker.com/_/caddy/tags
DOCKER_IMAGE_CADDY: "docker.io/caddy:2.7.4"
# https://hub.docker.com/_/elasticsearch/tags
DOCKER_IMAGE_ELASTICSEARCH: "docker.io/elasticsearch:7.17.13"
# https://hub.docker.com/r/getmeili/meilisearch/tags
DOCKER_IMAGE_MEILISEARCH: "docker.io/getmeili/meilisearch:v1.8.4"
# https://hub.docker.com/_/mongo/tags
DOCKER_IMAGE_MONGODB: "docker.io/mongo:7.0.7"
# https://hub.docker.com/_/mysql/tags
Expand All @@ -29,10 +29,6 @@ DOCKER_IMAGE_REDIS: "docker.io/redis:7.2.4"
DOCKER_IMAGE_SMTP: "docker.io/devture/exim-relay:4.96-r1-0"
EDX_PLATFORM_REPOSITORY: "https://github.com/openedx/edx-platform.git"
EDX_PLATFORM_VERSION: "{{ OPENEDX_COMMON_VERSION }}"
ELASTICSEARCH_HOST: "elasticsearch"
ELASTICSEARCH_PORT: 9200
ELASTICSEARCH_SCHEME: "http"
ELASTICSEARCH_HEAP_SIZE: 1g
ENABLE_HTTPS: false
ENABLE_WEB_PROXY: true
JWT_COMMON_AUDIENCE: "openedx"
Expand All @@ -42,6 +38,9 @@ K8S_NAMESPACE: "openedx"
LANGUAGE_CODE: "en"
LMS_HOST: "www.myopenedx.com"
LOCAL_PROJECT_NAME: "{{ TUTOR_APP }}_local"
MEILISEARCH_URL: "http://meilisearch:7700"
MEILISEARCH_PUBLIC_URL: "{% if ENABLE_HTTPS %}https{% else %}http{% endif %}://meilisearch.{{ LMS_HOST }}"
MEILISEARCH_INDEX_PREFIX: "tutor_"
MONGODB_AUTH_MECHANISM: ""
MONGODB_AUTH_SOURCE: "admin"
MONGODB_HOST: "mongodb"
Expand Down Expand Up @@ -73,7 +72,7 @@ REDIS_HOST: "redis"
REDIS_PORT: 6379
REDIS_USERNAME: ""
REDIS_PASSWORD: ""
RUN_ELASTICSEARCH: true
RUN_MEILISEARCH: true
RUN_MONGODB: true
RUN_MYSQL: true
RUN_REDIS: true
Expand Down
19 changes: 10 additions & 9 deletions tutor/templates/dev/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,19 +32,20 @@ services:
ports:
- "8001:8000"

{% if RUN_MEILISEARCH -%}
meilisearch:
ports:
- "127.0.0.1:7700:7700"
networks:
default:
aliases:
- "{{ MEILISEARCH_PUBLIC_URL.split('://')[1] }}"
{%- endif %}

# Additional service for watching theme changes
watchthemes:
<<: *openedx-service
command: npm run watch-sass
restart: unless-stopped

{% if RUN_ELASTICSEARCH and is_docker_rootless() %}
elasticsearch:
ulimits:
memlock:
# Fixes error setting rlimits for ready process in rootless docker
soft: 0 # zero means "unset" in the memlock context
hard: 0
{% endif %}

{{ patch("local-docker-compose-dev-services")|indent(2) }}
4 changes: 4 additions & 0 deletions tutor/templates/jobs/init/cms.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,7 @@ fi
# Create waffle switches to enable some features, if they have not been explicitly defined before
# Copy-paste of units in Studio (highly requested new feature, but defaults to off in Quince)
(./manage.py cms waffle_flag --list | grep contentstore.enable_copy_paste_units) || ./manage.py lms waffle_flag --create contentstore.enable_copy_paste_units --everyone

# Re-index studio and courseware content
./manage.py cms reindex_studio --experimental
./manage.py cms reindex_course --active
3 changes: 3 additions & 0 deletions tutor/templates/jobs/init/lms.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ echo "Loading settings $DJANGO_SETTINGS_MODULE"

./manage.py lms migrate

# Create meilisearch indexes
./manage.py lms shell -c "import search.meilisearch; search.meilisearch.create_indexes()"

# Create oauth2 apps for CMS SSO
# https://github.com/openedx/edx-platform/blob/master/docs/guides/studio_oauth.rst
./manage.py lms manage_user cms cms@openedx --unusable-password
Expand Down
18 changes: 18 additions & 0 deletions tutor/templates/jobs/init/meilisearch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Get or create Meilisearch API key
python -c "
import meilisearch
client = meilisearch.Client('{{ MEILISEARCH_URL }}', '{{ MEILISEARCH_MASTER_KEY }}')
try:
client.get_key('{{ MEILISEARCH_API_KEY_UID }}')
print('Key already exists')
except meilisearch.errors.MeilisearchApiError:
print('Key does not exist: creating...')
client.create_key({
'name': 'Open edX backend API key',
'uid': '{{ MEILISEARCH_API_KEY_UID }}',
'actions': ['*'],
'indexes': ['{{ MEILISEARCH_INDEX_PREFIX }}*'],
'expiresAt': None,
'description': 'Use it for backend API calls -- Created by Tutor',
})
"
Loading

0 comments on commit 032e632

Please sign in to comment.