Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added indexes #425

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open

Added indexes #425

wants to merge 3 commits into from

Conversation

bio-boris
Copy link
Collaborator

@bio-boris bio-boris commented Nov 4, 2021

Description of PR purpose/changes

{"status": Status.queued.value, "queued": {"$lt": before_days}}

{"status": Status.created.value, "created": {"$lt": before_days}}

I don't think there's any indexes on status or the timestamp, so this is going to be a tablescan. I'd recommend making compound indices on state/queued and state/created

I'd just set them up in MongoUtil. That's more or less how I do it for all my servers, for example

{"status": "created", "_id": {"$lt": five_mins_ago}, "batch_job": {"$ne": True}}

{"status": Status.queued.value, "queued": {"$lt": before_days}}

Hmm I wonder which type of indexes are best, and if I need a different index on 'created' because it actually needs to know if its a batch job or not and the ID record not a field called created.

@bio-boris bio-boris requested a review from MrCreosote November 4, 2021 20:41
@@ -336,7 +336,11 @@ class Job(Document):
default=False
) # Marked true when all retry steps have completed

meta = {"collection": "ee2_jobs"}
# See https://docs.mongoengine.org/guide/defining-documents.html#indexes
# Hmm, are these indexes need to be + or - ?
Copy link
Member

@MrCreosote MrCreosote Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on if you want sorted results - if not, it doesn't really matter. With the status/queued index, if you do an equality query on status and a range query on queued you can sort descending on the queue time without having to do an in-memory sort. Ascending sorts would be in memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, with the change now oldest first sorts can be done without an in-memory sort

meta = {"collection": "ee2_jobs"}
# See https://docs.mongoengine.org/guide/defining-documents.html#indexes
# Hmm, are these indexes need to be + or - ?
indexes = [("status", "batch_job"), ("status", "-queued")]
Copy link
Member

@MrCreosote MrCreosote Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - would it make sense to have indexes with the username, e.g. (user, status, -queued)? Does EE2 not do any user specific queries?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for the job browser bff

        funcdef check_jobs_date_range_for_user(CheckJobsDateRangeParams params)
            returns (CheckJobsDateRangeResults) authentication required;
        funcdef check_jobs_date_range_for_all(CheckJobsDateRangeParams params)
            returns (CheckJobsDateRangeResults) authentication required;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I see that the original impetus was for the bad jobs reaper, which I assume doesn't care about the user name.

that being said, indexes targeted for those functions might not be a bad idea

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another call that searches on clusterid instead of job id

job_id = self.sdkmr.get_mongo_util().get_job_batch_name(cluster_id=cluster_id)
j = self.sdkmr.get_mongo_util().get_job(job_id=job_id) # type: Job

meta = {"collection": "ee2_jobs"}
# See https://docs.mongoengine.org/guide/defining-documents.html#indexes
# Hmm, are these indexes need to be + or - ?
indexes = [("status", "batch_job"), ("status", "queued")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the query for the status/batch_job index?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@MrCreosote MrCreosote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me assuming the indexes correctly handle the queries you expect

@bio-boris
Copy link
Collaborator Author

This looks fine to me assuming the indexes correctly handle the queries you expect

Do we figure this out using .explain() ?

@MrCreosote
Copy link
Member

yep

@MrCreosote
Copy link
Member

Well, that and just thinking through how the queries will use the indexes and how selective the indexes will be for the query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants