-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement dynamic pruning strategy #1295
Conversation
It seems like this might be relevant... Taking into account #1228 and that (IMO) we will sooner rather than later want to support this, along with a "max db" size (aka a user has 100 gbs of disk space they want to contribute, we'll actually spin up 10 nodes of 10gbs under the hood). Once that's implemented, we'll have a consistent max db size, which means we really don't have to worry about improving the pruning mechanism to support large dbs and can optimize db performance against a fixed target (eg a 10gb storage). Assuming that my description is the direction we're heading, does this pr still make sense? |
I don't think this is how the architecture will actually work out. I think that under the hood it will be a single 100gb database that is represented in the network across 10 different node-ids. |
Together with what Piper said, I would add that there is benefit for smaller dbs as well (as we can purge more there). In general, I would say it's better to monitor and adjust in any scenario. For example:
|
4ac42b6
to
e3b1755
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Mostly just nit comments that you're welcome to ignore.
I apparently have a strong opinion about the ideal way to calculate the percentage update of the pruning number (more below). But in the end, it's not a blocker because even if the number isn't ideal, I don't foresee any catastrophe. So don't let me stop you from merging it if you disagree about the percentage update 😆
self.radius = Distance::ZERO; | ||
self.metrics.report_radius(self.radius); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiniest of nits: maybe it's worth skipping reporting the radius if it was already 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused with "if it was already 0." part. This functions is called only once during initialization, so it wasn't anything useful at this point (it is MAX
by default).
I also believe that it's still useful to report it at least once, so we have at least some data in grafana. Or do you want on purpose not to report it to grafana?
if !self.pruning_strategy.should_prune(&self.usage_stats) { | ||
warn!(Db = %self.config.content_type, | ||
"Pruning requested but we are below target capacity. Skipping"); | ||
"Pruning requested but not needed. Skipping"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ever showing up? It seems strange that the call to prune only happens if should_prune()
is true and then prune immediately checks it again inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't show up, and with current code, it can't (as you said, should_prune
is always called right before prune
).
I put it in case there is some refactoring or some other changes to the code, in which case we will see it in the logs (therefore warn
level, indicating that something got wrong).
self.metrics.stop_process_timer(delete_timer); | ||
self.pruning_strategy | ||
.observe_pruning_duration(pruning_start_time.elapsed()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a bummer to have prune run its own Instant
timer, when we're already calculating the time in the metrics. I suppose if stop_process_timer
returned the duration it observed (looks like it's an f64 in seconds, but we could convert to a Duration
), then observe_duration
could reuse it. Do you agree that would be preferable? If you don't feel like doing that, I might do it after this merges, just so it's ready the next time we want to re-use the timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about the same thing. But I didn't want to complicate this PR any further. I'm fine with doing it myself in the followup PR.
max_pruning_count_change_fraction: f64, | ||
optimal_pruning_duration_range: Range<Duration>, | ||
) -> Self { | ||
if !(0.0..1.0).contains(&target_capacity_fraction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I suppose 1.0 is a valid choice here, which Range seems to exclude. Due to database compression, there was even a time when >1.0 choices might be valid. I haven't observed how SQL performs since the switchover. Anyway it all seems mostly moot since it's hard-coded for now, but just wanted to make a note that I think it would be fine to limit the input up to 2.0 or something, since it would just be about letting people play with configuration values eventually, I don't think it's a massive footgun that we need to panic to avoid letting them play with larger values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely while coding I had a mental model that target_capacity
should we lower than storage_capacity
.
Maybe code would work even if that's not the case, but I would say it doesn't makes much sense, because both storage capacity and target capacity represent the raw content size (content_id.len() + content_key.len() + content_value.len()
), or at least approximation (there is some extra storage used for indexes and other smaller columns). Irrelevant, but I don't think that there is any compression happening at the moment.
With that being said, I will allow value 1.0. And once we have real use case for using values higher than 1, we can come back to it (in which case we would probably have to rewrite some of the logic).
target_capacity_fraction | ||
) | ||
} | ||
if !(0.0..1.0).contains(&max_pruning_count_change_fraction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, it's not totally obvious to me that 1 is a hard upper limit here. If someone wanted to play with 2x the pruning count change, my intuition isn't to panic to prevent them from playing with it.
Edit: I see where the number came from now, but stand by the above comment when paired with the suggestion below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually wanted increasing the max_pruning_count
to be harder than decreasing, because downside of being too low is not so bad (we prune less items more frequently) while downside of being too high is that db is blocked for longer.
I added better explanation to the max_pruning_count_change_fraction
field.
However, if the behavior that you expected is more common way of doing it, I'm willing to change it in the followup PR.
Db = %self.config.content_type, | ||
"Pruning was too slow. Decreasing max_pruning_count", | ||
); | ||
1. - pruning_config.max_pruning_count_change_fraction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see now where the upper limit of 1 came from. My intuition before getting to this code is that a change fraction of 1.0 would double on the upside and cut in half on the downside. Those seem like the more symmetric options to me. (ie~ one increase and one decrease takes you back to where you started)
In this approach, choosing a value of .9 nearly doubles on the upside, but is 1/10th on the downside, which is aggressively dropping but not that aggressively climbing (it takes more than 3 upside adjustments to make up for 1 downside adjustment). So if we want one configuration number to govern both directions, my intuition says that this line should be:
1. - pruning_config.max_pruning_count_change_fraction | |
1. / (1. + pruning_config.max_pruning_count_change_fraction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛵
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carver I do disagree with the proposed changes about pruning number changes, but not strongly :). Willing to discuss it more before merging.
self.radius = Distance::ZERO; | ||
self.metrics.report_radius(self.radius); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused with "if it was already 0." part. This functions is called only once during initialization, so it wasn't anything useful at this point (it is MAX
by default).
I also believe that it's still useful to report it at least once, so we have at least some data in grafana. Or do you want on purpose not to report it to grafana?
if !self.pruning_strategy.should_prune(&self.usage_stats) { | ||
warn!(Db = %self.config.content_type, | ||
"Pruning requested but we are below target capacity. Skipping"); | ||
"Pruning requested but not needed. Skipping"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't show up, and with current code, it can't (as you said, should_prune
is always called right before prune
).
I put it in case there is some refactoring or some other changes to the code, in which case we will see it in the logs (therefore warn
level, indicating that something got wrong).
self.metrics.stop_process_timer(delete_timer); | ||
self.pruning_strategy | ||
.observe_pruning_duration(pruning_start_time.elapsed()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about the same thing. But I didn't want to complicate this PR any further. I'm fine with doing it myself in the followup PR.
max_pruning_count_change_fraction: f64, | ||
optimal_pruning_duration_range: Range<Duration>, | ||
) -> Self { | ||
if !(0.0..1.0).contains(&target_capacity_fraction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely while coding I had a mental model that target_capacity
should we lower than storage_capacity
.
Maybe code would work even if that's not the case, but I would say it doesn't makes much sense, because both storage capacity and target capacity represent the raw content size (content_id.len() + content_key.len() + content_value.len()
), or at least approximation (there is some extra storage used for indexes and other smaller columns). Irrelevant, but I don't think that there is any compression happening at the moment.
With that being said, I will allow value 1.0. And once we have real use case for using values higher than 1, we can come back to it (in which case we would probably have to rewrite some of the logic).
target_capacity_fraction | ||
) | ||
} | ||
if !(0.0..1.0).contains(&max_pruning_count_change_fraction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually wanted increasing the max_pruning_count
to be harder than decreasing, because downside of being too low is not so bad (we prune less items more frequently) while downside of being too high is that db is blocked for longer.
I added better explanation to the max_pruning_count_change_fraction
field.
However, if the behavior that you expected is more common way of doing it, I'm willing to change it in the followup PR.
bff7f5d
to
08d1c69
Compare
I'm going to merge this and do a followup PR if needed. |
What was wrong?
Current pruning logic for
IdIndexedV1Store
is to delete up to 5% of storage, but no more than 100 elements. However, even 100 elements is too slow on big databases (e.g. 35GB that are currently used). It can take up to a second and sometimes even longer. This wouldn't be such a big problem if this wouldn't block other read/writes.How was it fixed?
The idea is to have dynamic number of items that we delete. The
PruningConfig
allows us to set parameters, for this dynamic strategy:Also added logic for handling the zero storage capacity to prune the db (if non-empty) and set radius to zero.
To-Do