[refactor] Restructuring cache #84

curz46 · 2019-07-30T13:21:19Z

Motivation

A recent bug fix to Alchemy's caching mechanism reveals a significant flaw - it doesn't handle concurrent changes very well. Right now, a monkey patch is in place that blocks all events until READY's handler has finished execution, but a more favored approach would be making operations to the cache atomic (i.e. impossible for interleaving described in [bug | high priority] READY vs GUILD_CREATE race condition makes guild cache forever unavailable #79 to occur)
Private channels are currently supported quite awkwardly by a PrivChannels cache and a separation of how guild channels and private channels are handled. Operating generically on the channel of a message (e.g. in Cogs), for this reason, is difficult.
Access time is inconsistent - getting a role from cache requires searching for the guild data and accessing its properties where it should just be a simple id lookup.

Goals

Avoid redundancy
Allow concurrent reads
In-built locking/atomic operations

Current Structure

Guilds (guild cache) GenServer contains guild data, guild channels, roles and members.
PrivChannels (private channel cache) ETS stores private channels.
User (user cache) GenServer stores users.

The data stored by the cache is the parsed JSON, rather than the structs the user deals with.

Proposed Structure

(partially through discussion with @cronokirby and others, partially my own thoughts)

GuildCache in ETS, with all redundant data stripped away. For example, roles contains a list of snowflakes, not a list of role objects.
UserCache in ETS.
RoleCache in ETS. Struct has guild_id added.
ChannelCache in ETS. Struct has guild_id added, if the channel belongs to a guild.
MemberCache in ETS. Need to be careful here since a member's unique key is the composition of guild_id and user_id.

Notes

ETS completely replaces usage of GenServers to provide concurrent reads and in-built locking.
Cache data is stored as structs, not the raw JSON data. Less work done to fetch from cache and implementation is easier, since there's no need to remember if we're working with string keys or atom keys.
Linked to [feature | breaking] Smart cache for Client.get* methods #76.
You could argue there's no point storing MemberCache separate from UserCache. I think it makes things easier to reason about.

The text was updated successfully, but these errors were encountered:

OvermindDL1 · 2019-07-30T16:15:03Z

ETS doesn't actually implement locking, however mnesia does, so this actually seems like a case for mnesia instead. In addition, if it's needed (not for my bot), the mnesia database can be replicated onto multiple nodes. mnesia just wraps ETS (and DETS too if you set it to write anything to disk, though not in this case) so you get all of ETS features, in addition to a few others like locking, transactions, etc...

cronokirby · 2019-07-30T23:49:21Z

We should aim to normalize things, to avoid storing redundant structs in the cache, and to make accessing individual structs easy. That is to say, we should strip complex objects like Guilds from the objects they contain, and instead just leave their snowflake / id. Each object can have its own ets / mnesia table.

As far as locking / transactions goes, we need to be able to make sure that when handling an event which requires inserting multiple things, we do so in an atomic way, to avoid running into interleaving issues like we had previously.

OvermindDL1 · 2019-07-31T16:23:05Z

As far as locking / transactions goes, we need to be able to make sure that when handling an event which requires inserting multiple things, we do so in an atomic way, to avoid running into interleaving issues like we had previously.

Definitely mnesia over ETS.

Actually at this point I'm leaning to using Cachex, it has transaction support, distributed caches, etc...

cronokirby · 2019-08-01T21:57:53Z

I've created a Milestone / Project to track issues related to this effort:
https://github.com/cronokirby/alchemy/projects

I don't expect this to come as one PR, so I've created the new-cache branch. As we slowly work on this branch to replace the cache, let's create issues tagged with the new-cache label, and keep track of them under the project board.

I think Cachex should be used, as it provides a very solid implementation of a Cache, and lets us focus on the discord specific logic. It also has great support for transactions and grouping queries, which will be very useful for us.

The first things to do at this point would be to split up the Cache work into small issues. A good place to start would be replacing one of the smaller ets tables or processes using Cachex, and adding a good test suite. This would serve as an example for how to do the rest of the cache. Replacing the guild cache is probably the last thing to do, as it relies on the rest of the cache working well.

cronokirby · 2019-08-01T21:58:27Z

Oh, and I've labelled this issue as discussion. Let's keep this issue as a centralized place to track work on this project :)

cronokirby added the enhancement label Jul 30, 2019

cronokirby added the refactoring Making the codebase cleaner label Jul 31, 2019

cronokirby added the discussion label Aug 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor] Restructuring cache #84

[refactor] Restructuring cache #84

curz46 commented Jul 30, 2019 •

edited

Loading

OvermindDL1 commented Jul 30, 2019

cronokirby commented Jul 30, 2019

OvermindDL1 commented Jul 31, 2019

cronokirby commented Aug 1, 2019

cronokirby commented Aug 1, 2019

[refactor] Restructuring cache #84

[refactor] Restructuring cache #84

Comments

curz46 commented Jul 30, 2019 • edited Loading

Motivation

Goals

Current Structure

Proposed Structure

OvermindDL1 commented Jul 30, 2019

cronokirby commented Jul 30, 2019

OvermindDL1 commented Jul 31, 2019

cronokirby commented Aug 1, 2019

cronokirby commented Aug 1, 2019

curz46 commented Jul 30, 2019 •

edited

Loading