Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[distributed store] use a single map Vec<u8> -> Counters map #340

Merged
merged 2 commits into from
May 23, 2024

Conversation

chirino
Copy link
Contributor

@chirino chirino commented May 22, 2024

By keying using Vec we reduce how often we need to encode/decode the the counter keys.

By keying using Vec<u8> we reduce how often we need to encode/decode the the counter keys.

Signed-off-by: Hiram Chirino <[email protected]>
@chirino chirino force-pushed the distributed-bytes-keys branch from 4ecdf0d to 3895c8a Compare May 22, 2024 15:01
@chirino chirino requested a review from alexsnaps May 23, 2024 11:59
if let Some(limited) = process_counter(counter, value.read(), delta) {
if !load_counters {
return Ok(limited);
// we need to take the slow path since we need to mutate the limits map.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is an interesting property of non qualified counters (i.e. one where there is no variable qualifying the counter), where for all limits we know, there is a fine set of counters. Given that, we chose to always have a counter for these limits present in our in memory storage to avoid ever having to mutate the datastructure that contains these counters for these (i.e. it's "cheaper" to keep an expired counter around than. updating the datastructure to remove it, to only add it back later).

I understand this isn't this here, but I'm just sharing and thinking what it would mean for the distributed case, i.e. if any instance within a cluster only cares about counters it knows about from its limits.yaml ... 🤔 Should that be shared with other instances? So they can filter on what they share with these other instances? Should we make it the concern of the receiver to drop things it doesn't care about (trading off bandwidth to other instances...)?

Anyways, a bit of a random thought, but one I thought I'd share...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an optimization thing that would be hard to get the balance right.

@@ -356,29 +268,47 @@ impl CrInMemoryStorage {
struct CounterKey {
namespace: Namespace,
seconds: u64,
max_value: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a somewhat radical change... I think you "only" do this to be able to implement From<CounterKey> for Counter that requires building the corresponding Limit, right? Now, it also means that someone changing the max_value of one of their Limit would reset all their counters for that Limit, which we learned is not what users expect.

Now, back to my point above about who gets to decide what counter values an instance gets to see or stores locally? What would it mean if cluster A has a max_value of 10, while cluster B has one of 100? Should we consider the counter to be the same, but reach the threshold at different values in each cluster? Should they now be different counters? But how would someone alter a limit's threshold, locally or across all sharing instances?

Copy link
Contributor Author

@chirino chirino May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we push the max_value into the value portion of the map entry? Maybe we have to make it a (max_value, changed_at) tuple type thing so that the last distributed update of it wins?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a changed_at is not needed, as that's something that's a limits configuration value, so that change should be rolled to all peers at the same time.

self.broker.publish(CounterUpdate {
key,
key: store_key,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat pick: this seems we use the same "name" for two things... or I guess in this case two representations of the same thing, is that right? I'd just either call it key (and value), or if there are places where two "things" name clash, first try to get rid of it, or just call these things differently, but consistently.

This might be a one off and a real minor thing, but I like keeping an eye out for these when I write such code patterns and avoid it starts leaking all over

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I had done a refactor of renaming CrCounterValue to StoreValue to make it more clear that is the value type that the distributed store works with, and then I reverted it to avoid the PR diff from being too large. I forgot to revert the variable names too.

@chirino chirino merged commit 8ce12c8 into Kuadrant:main May 23, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants