Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add index by user_id on classifications #56

Merged
merged 1 commit into from
Apr 2, 2024

Conversation

yuenmichelle1
Copy link
Collaborator

@yuenmichelle1 yuenmichelle1 commented Mar 28, 2024

Add index by user_id on classification_events. Currently, we are indexing by event_time as well as our composite primary key which automatically creates index by pkey (a combo of event_time and classification_id).

Originally, classification_events was designed without the index by user_id mainly because

  • ERAS' responsibility is to be a classification/comment counter (of when and how many)
  • ERAS leaves the responsibility of counting by user_id to the materialized view/continuous aggregate. (DailyClassificationCountsPerUser____)

As we look into sync script (syncing newly added group members and their classifications), we now care about the case where users who are supposed to belong to a user_group but did not join the user_group (and have classifications that should be counted to the user_group). And therefore we need to search classification_events by user_id. This search is currently inefficient since it has to go through sequential scans of the whole hypertable in order to receive results.

NOTE
-Timescale currently does not support creating indexes concurrently. This means that we cannot avoid write locks.

@yuenmichelle1 yuenmichelle1 requested review from zwolf and lcjohnso and removed request for lcjohnso March 28, 2024 19:26
db/schema.rb Show resolved Hide resolved
db/schema.rb Show resolved Hide resolved
@yuenmichelle1
Copy link
Collaborator Author

@zwolf @lcjohnso as long as we are encountering this, is there a need to add indexes for any other tables of ERAS?

(My gut says no, but I also am being conservative since indexes take up storage space)

@lcjohnso As I keep perspective, this issue of seq scans when searching by user_id will be an issue for the sync job not just for the first time run, but also for subsequent runs until we optimize the query of searching classification events by specific user_ids (Hopefully this change should optimize the query).

@yuenmichelle1 yuenmichelle1 merged commit bf69893 into main Apr 2, 2024
4 checks passed
@yuenmichelle1 yuenmichelle1 deleted the add-index-by-user_id-for-certain-hypertables branch May 22, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant