Skip to content
This repository has been archived by the owner on Feb 7, 2023. It is now read-only.

[refactor] Restructuring cache #84

Open
curz46 opened this issue Jul 30, 2019 · 5 comments
Open

[refactor] Restructuring cache #84

curz46 opened this issue Jul 30, 2019 · 5 comments
Labels
discussion enhancement refactoring Making the codebase cleaner

Comments

@curz46
Copy link
Contributor

curz46 commented Jul 30, 2019

Motivation

  • A recent bug fix to Alchemy's caching mechanism reveals a significant flaw - it doesn't handle concurrent changes very well. Right now, a monkey patch is in place that blocks all events until READY's handler has finished execution, but a more favored approach would be making operations to the cache atomic (i.e. impossible for interleaving described in [bug | high priority] READY vs GUILD_CREATE race condition makes guild cache forever unavailable #79 to occur)
  • Private channels are currently supported quite awkwardly by a PrivChannels cache and a separation of how guild channels and private channels are handled. Operating generically on the channel of a message (e.g. in Cogs), for this reason, is difficult.
  • Access time is inconsistent - getting a role from cache requires searching for the guild data and accessing its properties where it should just be a simple id lookup.

Goals

  1. Avoid redundancy
  2. Allow concurrent reads
  3. In-built locking/atomic operations

Current Structure

  • Guilds (guild cache) GenServer contains guild data, guild channels, roles and members.
  • PrivChannels (private channel cache) ETS stores private channels.
  • User (user cache) GenServer stores users.

The data stored by the cache is the parsed JSON, rather than the structs the user deals with.

Proposed Structure

(partially through discussion with @cronokirby and others, partially my own thoughts)

  • GuildCache in ETS, with all redundant data stripped away. For example, roles contains a list of snowflakes, not a list of role objects.
  • UserCache in ETS.
  • RoleCache in ETS. Struct has guild_id added.
  • ChannelCache in ETS. Struct has guild_id added, if the channel belongs to a guild.
  • MemberCache in ETS. Need to be careful here since a member's unique key is the composition of guild_id and user_id.

Notes

  • ETS completely replaces usage of GenServers to provide concurrent reads and in-built locking.
  • Cache data is stored as structs, not the raw JSON data. Less work done to fetch from cache and implementation is easier, since there's no need to remember if we're working with string keys or atom keys.
  • Linked to [feature | breaking] Smart cache for Client.get* methods #76.
  • You could argue there's no point storing MemberCache separate from UserCache. I think it makes things easier to reason about.
@OvermindDL1
Copy link

ETS doesn't actually implement locking, however mnesia does, so this actually seems like a case for mnesia instead. In addition, if it's needed (not for my bot), the mnesia database can be replicated onto multiple nodes. mnesia just wraps ETS (and DETS too if you set it to write anything to disk, though not in this case) so you get all of ETS features, in addition to a few others like locking, transactions, etc...

@cronokirby
Copy link
Owner

We should aim to normalize things, to avoid storing redundant structs in the cache, and to make accessing individual structs easy. That is to say, we should strip complex objects like Guilds from the objects they contain, and instead just leave their snowflake / id. Each object can have its own ets / mnesia table.

As far as locking / transactions goes, we need to be able to make sure that when handling an event which requires inserting multiple things, we do so in an atomic way, to avoid running into interleaving issues like we had previously.

@cronokirby cronokirby added the refactoring Making the codebase cleaner label Jul 31, 2019
@OvermindDL1
Copy link

As far as locking / transactions goes, we need to be able to make sure that when handling an event which requires inserting multiple things, we do so in an atomic way, to avoid running into interleaving issues like we had previously.

Definitely mnesia over ETS.

Actually at this point I'm leaning to using Cachex, it has transaction support, distributed caches, etc...

@cronokirby
Copy link
Owner

I've created a Milestone / Project to track issues related to this effort:
https://github.com/cronokirby/alchemy/projects

I don't expect this to come as one PR, so I've created the new-cache branch. As we slowly work on this branch to replace the cache, let's create issues tagged with the new-cache label, and keep track of them under the project board.

I think Cachex should be used, as it provides a very solid implementation of a Cache, and lets us focus on the discord specific logic. It also has great support for transactions and grouping queries, which will be very useful for us.

The first things to do at this point would be to split up the Cache work into small issues. A good place to start would be replacing one of the smaller ets tables or processes using Cachex, and adding a good test suite. This would serve as an example for how to do the rest of the cache. Replacing the guild cache is probably the last thing to do, as it relies on the rest of the cache working well.

@cronokirby
Copy link
Owner

Oh, and I've labelled this issue as discussion. Let's keep this issue as a centralized place to track work on this project :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
discussion enhancement refactoring Making the codebase cleaner
Projects
None yet
Development

No branches or pull requests

3 participants