Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: replace Elasticsearch by Meilisearch #1141

Merged
merged 1 commit into from
Nov 1, 2024
Merged

Conversation

regisb
Copy link
Contributor

@regisb regisb commented Oct 17, 2024

With this change, we get rid of Elasticsearch across all of Tutor. Instead, we run Meilisearch, which is much more lightweight in terms of memory usage. Obviously, this is a (very) breaking change. Indexing commands will be run during init, such that search should work as before.

After the edx-search PR is merged and the dependency is upgraded in edx-platform, we should remove the manual RUN pip install ... command.

tutor/env.py Outdated Show resolved Hide resolved
Copy link

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regisb This is looking great -- just a couple of minor issues when I used this in my tutor dev env, along with these open PRs:

Am deploying a sandbox to test the k8s setup, will report back with any issues there.

tutor/templates/k8s/volumes.yml Show resolved Hide resolved
Comment on lines +21 to +26
./manage.py cms reindex_studio --experimental
./manage.py cms reindex_course --active

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should gate these too (though not sure about which variable to use to do this?)

Suggested change
./manage.py cms reindex_studio --experimental
./manage.py cms reindex_course --active
{% if ACTIVATE_MEILISEARCH %}
./manage.py cms reindex_studio --experimental
./manage.py cms reindex_course --active
{% endif %}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think that's right. We assume that meilisearch is running, somewhere. Even if RUN_MEILISEARCH is false, it just means that meilisearch is running on a remote host. So we still need to run the migration commands.

Copy link

@pomegranited pomegranited Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, you're correct!

There's an issue with reindex_course --setup referencing elasticsearch directly (openedx/edx-platform#35743), but that's not Tutor's problem :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we run ./manage.py lms shell -c "import search.meilisearch; search.meilisearch.create_indexes()" in init/lms.sh. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also keep in mind that until we resolve openedx/modular-learning#235 (which is in progress), we should avoid running ./manage.py cms reindex_studio --experimental on any instances with thousands of courses or more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for re-indexing courses, I haven't looked into it but it does sound like we're doing it more than is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a function that would only create and configure the indexes (without deleting them first), such that we could call this function at every launch? Pretty much like a ./manage.py migrate command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't exist yet, but we could add that. A --create-only flag would be trivial.

The edge case is when upgrading to a newer commit/release and the index parameters have changed; in that case, we need to replace the index and the new index will be empty unless we also re-index all the content. This could be done asynchronously though; it also doesn't replace the existing index until the reindexing is complete, so there's no downtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need a lightweight, idempotent index creation command that works in just all cases. This can take different forms: a single python function, a management command, or a set of migrations. If we can get this in Sumac then that's great, if not then we should probably think about it for later releases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a proposal to openedx/modular-learning#235 for a new --init flag that will achieve this.

Copy link

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 once comments 1 & 2 are addressed.

I tested this by deploying and testing a k8s sandbox configured with openedx/edx-platform#35689 . The sandbox demonstrates a functional meilisearch deployment, with published course content search indexed for students, and draft content for courses + libraries v2 indexed in Authoring MFE for Content Authors. Note that this deployment required fixes from overhangio/tutor-mfe#231 + openedx/edx-platform#35743.

However, I'm unable to create new posts using tutor-forum@v2 (overhangio/tutor-forum#48), so I wasn't able to test this with the new edx-search Meilisearch engine. Happy to leave the sandbox running to test updated versions of tutor-forum@v2 and https://github.com/edly-io/edx-platform.git@edly/forumv.

@regisb
Copy link
Contributor Author

regisb commented Oct 30, 2024

Again, thanks for the quick and thorough review Jill. Meilisearch is not supposed to work with forum v2 just yet, this would be pending some work on my side. I'd like to complete this PR before I move on to the integration of meilisearch in forum v2.

@regisb regisb force-pushed the regisb/meilisearch branch from 7e8570b to 032e632 Compare October 30, 2024 11:11
With this change, we get rid of Elasticsearch across all of Tutor.
Instead, we run Meilisearch, which is much more lightweight in terms of
memory usage. Obviously, this is a (very) breaking change. Indexing
commands will be run during init, such that search should work as
before.

After the edx-search PR is merged and the dependency is upgraded in
edx-platform, we should remove the manual `RUN pip install ...` command.
@regisb regisb force-pushed the regisb/meilisearch branch from c996f3b to ebc39e3 Compare November 1, 2024 09:40
@regisb regisb merged commit 1482a33 into sumac Nov 1, 2024
@regisb regisb deleted the regisb/meilisearch branch November 1, 2024 11:01
Faraz32123 pushed a commit to edly-io/tutor-discovery that referenced this pull request Nov 1, 2024
- As Tutor and Open edX have shifted to Meilisearch, and course-discovery still depends on Elasticsearch, running the Elasticsearch container with tutor-discovery will facilitate smoother operation for the course-discovery service.
- It's related PR from tutor: overhangio/tutor#1141.
Faraz32123 pushed a commit to edly-io/tutor-discovery that referenced this pull request Nov 1, 2024
- As Tutor and Open edX have shifted to Meilisearch, and course-discovery still depends on Elasticsearch, running the Elasticsearch container with tutor-discovery will facilitate smoother operation for the course-discovery service.
- It's related PR from tutor: overhangio/tutor#1141.
DawoudSheraz pushed a commit to overhangio/tutor-discovery that referenced this pull request Nov 1, 2024
- As Tutor and Open edX have shifted to Meilisearch, and course-discovery still depends on Elasticsearch, running the Elasticsearch container with tutor-discovery will facilitate smoother operation for the course-discovery service.
- It's related PR from tutor: overhangio/tutor#1141.

Co-authored-by: Muhammad Faraz  Maqsood <[email protected]>
Faraz32123 added a commit to overhangio/tutor-discovery that referenced this pull request Nov 15, 2024
- As Tutor and Open edX have shifted to Meilisearch, and course-discovery still depends on Elasticsearch, running the Elasticsearch container with tutor-discovery will facilitate smoother operation for the course-discovery service.
- It's related PR from tutor: overhangio/tutor#1141.

Co-authored-by: Muhammad Faraz  Maqsood <[email protected]>
DawoudSheraz pushed a commit to overhangio/tutor-discovery that referenced this pull request Nov 29, 2024
- As Tutor and Open edX have shifted to Meilisearch, and course-discovery still depends on Elasticsearch, running the Elasticsearch container with tutor-discovery will facilitate smoother operation for the course-discovery service.
- It's related PR from tutor: overhangio/tutor#1141.

Co-authored-by: Muhammad Faraz  Maqsood <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

4 participants