Replies: 2 comments
-
Storing MessagesThe tracking endpoint will receive raw JSON messages, extract basic metadata, and push the messages into a queue. A background thread processes the queue, figures out which messages are new and which are edited, and periodically packs, compresses, and stores them in the database. The basic storage design comprises two tables, one that stores The big question is how the packing should work. One option is to wait for either a certain amount of messages or a certain amount of time, and then simply create a pack from those, but that could lead to fragmentation and bad data locality. The expectation is that messages in the same channel that were posted around the same time will be accessed together, so they should be in the same pack. A more advanced approach could search existing packs and insert new messages into them, but that would add a lot of complexity (especially if multiple instances were writing to the same database). The best balance might be to only allow new packs to be created while processing the queue, but have a "vacuum" process that could be initiated manually (or automatically) that would find bad packs and optimize them. This would require a lock either over the entire database, or if a pack was for example restricted to one channel, a lock over all packs for that channel. There also needs to be a way to tell whether the database already has the most recent version of a message. By storing the timestamp of the last edit and having it be part of the primary key, it would allow storing multiple versions of a message and solve another long-standing issue. To summarize, this is how I want the initial implementation to work:
Once this works, I will analyze efficiency and fragmentation. A vacuum process can be implemented later. |
Beta Was this translation helpful? Give feedback.
-
Database Imports / ExportsSince the packing and compression will make it impossible to directly use SQL with the database file to parse the message data, I would like to support configurable exports. At minimum, there should be a way to import and export an uncompressed SQLite database, which would have a single table with the raw JSON data. Modern SQLite supports JSON operators, so that should be flexible enough to do any kind of data manipulation and analysis. In the future, additional formats could be supported. Ideas:
Filters would be useful for people who don't want to export the entire database, since the uncompressed exports will be much larger. The database will need to store some metadata about servers, channels, etc. so the UI can display proper names and not just IDs. |
Beta Was this translation helpful? Give feedback.
-
Continuation of #166.
I have decided on a different approach for storing raw message data with SQLite. Messages will be stored in bundles of say, 100 messages per bundle, and the entire bundle will be compressed at once. This should lead to a massive reduction in redundancy, and possibly be even smaller than the current database format despite having a lot more data.
This will be a completely separate database format, possibly a separate app for now. Migrating between database formats might come later.
I will write separate comments for each part of the design.
Beta Was this translation helpful? Give feedback.
All reactions