-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added indexes #425
base: develop
Are you sure you want to change the base?
Added indexes #425
Conversation
@@ -336,7 +336,11 @@ class Job(Document): | |||
default=False | |||
) # Marked true when all retry steps have completed | |||
|
|||
meta = {"collection": "ee2_jobs"} | |||
# See https://docs.mongoengine.org/guide/defining-documents.html#indexes | |||
# Hmm, are these indexes need to be + or - ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on if you want sorted results - if not, it doesn't really matter. With the status/queued index, if you do an equality query on status and a range query on queued you can sort descending on the queue time without having to do an in-memory sort. Ascending sorts would be in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, with the change now oldest first sorts can be done without an in-memory sort
meta = {"collection": "ee2_jobs"} | ||
# See https://docs.mongoengine.org/guide/defining-documents.html#indexes | ||
# Hmm, are these indexes need to be + or - ? | ||
indexes = [("status", "batch_job"), ("status", "-queued")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - would it make sense to have indexes with the username, e.g. (user, status, -queued)
? Does EE2 not do any user specific queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for the job browser bff
funcdef check_jobs_date_range_for_user(CheckJobsDateRangeParams params)
returns (CheckJobsDateRangeResults) authentication required;
funcdef check_jobs_date_range_for_all(CheckJobsDateRangeParams params)
returns (CheckJobsDateRangeResults) authentication required;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I see that the original impetus was for the bad jobs reaper, which I assume doesn't care about the user name.
that being said, indexes targeted for those functions might not be a bad idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another call that searches on clusterid instead of job id
execution_engine2/lib/execution_engine2/sdk/EE2Status.py
Lines 52 to 53 in 1e268df
job_id = self.sdkmr.get_mongo_util().get_job_batch_name(cluster_id=cluster_id) | |
j = self.sdkmr.get_mongo_util().get_job(job_id=job_id) # type: Job |
meta = {"collection": "ee2_jobs"} | ||
# See https://docs.mongoengine.org/guide/defining-documents.html#indexes | ||
# Hmm, are these indexes need to be + or - ? | ||
indexes = [("status", "batch_job"), ("status", "queued")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the query for the status/batch_job index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me assuming the indexes correctly handle the queries you expect
Do we figure this out using .explain() ? |
yep |
Well, that and just thinking through how the queries will use the indexes and how selective the indexes will be for the query |
Description of PR purpose/changes
{"status": Status.queued.value, "queued": {"$lt": before_days}}
{"status": Status.created.value, "created": {"$lt": before_days}}
I don't think there's any indexes on status or the timestamp, so this is going to be a tablescan. I'd recommend making compound indices on state/queued and state/created
I'd just set them up in MongoUtil. That's more or less how I do it for all my servers, for example
execution_engine2/bin/PurgeBadJobs.py
Line 102 in bc0644a
execution_engine2/bin/PurgeBadJobs.py
Line 81 in bc0644a
Hmm I wonder which type of indexes are best, and if I need a different index on 'created' because it actually needs to know if its a batch job or not and the ID record not a field called created.