-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a column participant
to room_memberships
table
#18068
base: develop
Are you sure you want to change the base?
Conversation
participant
to room_membership
tableparticipant
to room_memberships
table
Requesting @anoadragon453's feedback on this as he has context. |
Delayed events complement test failure looks like a flake? |
@anoadragon453 just wondering if I could get your input on this to ensure it aligns with what we discussed in #18040? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@H-Shay apologies for the slow turn-around on this one. I've been swamped with other work :(
I've made some large suggestions below, I'm afraid, but overall this is heading in a good direction!
txn: LoggingTransaction, last_room_id: str | ||
) -> Optional[str]: | ||
sql = """ | ||
SELECT room_id from room_memberships WHERE room_id > ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that room_memberships
is a table that is always be appended to, and thus always changing under you. It is ordered by its event_stream_ordering
column. So, the only way to traverse it while the system is running, without leaving gaps, is to iterate using the event_stream_ordering
column.
Note: room_memberships
only has rows deleted from it when a room is purged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we have:
- a query to pull out a room ID
- a query that pulls out all users that have ever been joined to that room
- a query per-user per-room that updates all previous entries that match that user/room combination
I think we can instead do this in one query that processes a batch of room_membership
rows all at once. Instead of saving the current room_id
for the batch job, start with the currently max event_stream_ordering
row and work backwards in batches of say 1000.
Constrain your query to the current event_stream_ordering
- BATCH_SIZE
. Then within that, UPDATE all rows based on data in the events
table. Then save the new event_stream_ordering - BATCH_SIZE
to your background job.
Now the table can continue to grow without things changing from underneath you, as historical data is only (rarely) deleted.
Adds a column
participant
toroom_memberships
table to track whether a user has participated in a room - participation is defined as having sent am.room.message
orm.room.encrypted
event into the room.Related to #18040, the approach there to determine room participation was deemed too inefficient and adding this column was the recommend remedy.