-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate datasource retention rules: Reject rules that fully contain subsequent rules' interval #15015
Conversation
2d15660
to
2f3b80b
Compare
server/src/test/java/org/apache/druid/server/http/RulesResourceTest.java
Fixed
Show fixed
Hide fixed
This would prevent users from shooting themselves in the foot. For example, if an interval in a rule is "large enough" and covers an interval in subsequent rule(s), then those rules will never run. Implement guard rails for rules in the rule chain.
2f3b80b
to
f72eda0
Compare
@abhishekrb19 , it seems like a nice idea to have these checks while updating retention rules. If there is a cluster that already defines rules that would fail the check, would that cluster break after this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't done a full review of the validation logic yet. Left some comments on the structure.
...rc/main/java/org/apache/druid/server/coordinator/rules/ForeverBroadcastDistributionRule.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/druid/server/coordinator/rules/PeriodBroadcastDistributionRule.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/rules/Rule.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/rules/Rules.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/rules/Rules.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/http/RulesResource.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/http/RulesResource.java
Outdated
Show resolved
Hide resolved
Thanks for the review, @kfaraz. Re:
No, this is an API-only validation change, so any existing rules in a cluster will remain as-is. |
server/src/main/java/org/apache/druid/server/coordinator/rules/Rule.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/rules/Rule.java
Outdated
Show resolved
Hide resolved
final Rule currRule = rules.get(i); | ||
final Rule nextRule = rules.get(i + 1); | ||
final Interval currInterval = currRule.getEligibleInterval(now); | ||
final Interval nextInterval = nextRule.getEligibleInterval(now); | ||
if (currInterval.contains(nextInterval)) { | ||
// If the current rule has an eternity interval, it covers everything following it. | ||
// Or if the current rule still covers the next rule at the current interval boundaries, then the | ||
// next rule will never fire at any time, so throw an exception. | ||
if (Intervals.ETERNITY.equals(currInterval) || | ||
(currRule.getEligibleInterval(currInterval.getStart()).contains(nextRule.getEligibleInterval(currInterval.getStart())) | ||
&& currRule.getEligibleInterval(currInterval.getEnd()).contains(nextRule.getEligibleInterval(currInterval.getEnd())))) { | ||
throw InvalidInput.exception( | ||
"Rule[%s] has an interval that contains interval for rule[%s]. The interval[%s] also covers interval[%s].", | ||
currRule, | ||
nextRule, | ||
currInterval, | ||
nextInterval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the behaviour in the following situation:
Rule#1 - Covers the interval 2020-2021
Rule#2 - Covers the interval 2021-2022
Rule#3 - Covers the interval 2020-2022
In this case validateRules
would not throw an exception, however, from my understanding, Rule#3 would never trigger.
If that's the case, we should handle these cases as well (probably by maintaining a set of intervals that the rules seen so far cover and checking if the next rule isn't completely overshadowed by the interval set seen so far)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a nice check to have as well. Let me think about it more and adjust as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LakshSingla, actually, Druid currently does a fully contains interval check for drop rules, so rule #3 is valid with dropByInterval
. So we can't really guarantee the overlapping condition in a type agnostic way (which apply to load rules, so we would probably need a separate contract)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I didn't get why that is the case. Rule#1 spans from [2020-2021). rule #2 spans from [2021-2022). Therefore, collectively, rule#1 and rule#2 span from [2020-2022). Is there any instant in the timeline that would cause rule#3 to fire without triggering either rule#1 or rule#2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, rule #3 is valid for the time interval if it existed in isolation. However, when present along with other rules, the precedence of those takes place, causing rule#3 to be effectively useless, which is what this check exists to detect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Rule #3 would never be applied on any segment irrespective of the rule types.
While evaluating retention rules, as soon as a rule applies to a segment, we break and don't look at the other rules.
Also, in general, I am not entirely sure of the validation logic here.
- For one, the set thing suggested by @LakshSingla makes sense. But we would have to ensure that the
fully covered
check is done correctly.- Additive logic (cleaner to implement): Say your set contains intervals
A [4, 10)
andB [15, 20)
and a new intervalC [8, 17)
comes in. Adding this new interval C should merge all of A, B and C into one interval[4, 17)
. - Subtractive logic: To check if an interval has any part which is not fully covered by the preceding intervals, remove all parts of it that overlap with intervals already encountered. If you are left with an empty interval, the rule is not valid.
- Additive logic (cleaner to implement): Say your set contains intervals
- Secondly, all of this is easy to follow when we are dealing with
ForeverXyzRule
orIntervalXyzRule
(rule with fixed interval). ForPeriodXyzRule
, I am not entirely sure of the boundaries that would need to be checked. The proposed change does compute theeligibleInterval
at both start and end of thecurrInterval
but I need to give it some more thought. - There could even be other types of rules in the future, but I guess we need not worry about them now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's consider a segment with an interval 2020/2022
and the following drop rules in order:
dropByInterval('2020/2021')
dropByInterval('2021/2022')
dropByInterval('2020/2022')
For the segment in question, only rule 3 will match as the complete interval is contained in the drop interval.
On the other hand, if we consider loadByInterval
rules for the same intervals, you're right that all the rules will apply for the segment as load rules will fire as long as there is at least an overlap.
Hope that clarifies why we cannot detect the overlapping case without baking in some type awareness, but let me know if I am missing something!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the incorrect assumption I made in my original example. This digression occurs from the semantics of how intervals are applied in load rules v/s in drop rules. In load rules, we only need for partial overlap for the rule to fire, while in the drop rule, we need a complete overlap for the rule to fire
However when we are doing a contains
check in this logic, we are inherently assuming something about the structure of the rules.
Consider the following case:
The system has 2 types of segments, one ranging from 2020-2021 and the other from 2020-2022. If the user wants to drop the segments ranging from 2020-2021 while preserving the other segments, how does the user craft intuitive rules for the same without causing the newly added check to fail? I only came up with the following, however, that fails the check.
- Drop 2020-2021
- Load 2020-2022
I think this ties back to the common use case that we are trying to prevent the user from doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re the use case above - I'd write the same rules as you've listed in the example. The first rule's interval 2020/2021 does not contain the second rule's interval 2020/2022. So, I'm not sure why the validation check would flag this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kfaraz - please see my responses to your comments:
we would have to ensure that the fully covered check is done correctly.
To check if an interval fully contains another, the proposed algorithm has a nested loop that goes through the sequence of rules to find if a larger interval contains another subsequent interval down in the list; O(n^2) algorithmic runtime, but practical runtime should be trivial though.
I suppose there are alternate approaches like condensing intervals - the additive and subtractive logic you're describing. My first thought is that this could work well for the overlapping case (which isn't the intent of this patch).
Re this:
Secondly, all of this is easy to follow when we are dealing with ForeverXyzRule or IntervalXyzRule (rule with fixed interval). For PeriodXyzRule, I am not entirely sure of the boundaries that would need to be checked. The proposed change does compute the eligibleInterval at both start and end of the currInterval but I need to give it some more thought.
The boundary check primarily exists to allow the case where a rule will fire at some point. Unlike the forever and interval rules, period rules factor in the reference timestamp thereby yielding distinct intervals at the boundaries. Something like:
PeriodRuleR3.eligibleInterval(now) = IntervalR3
PeriodRuleR3.eligibleInterval(1yAgo) = IntervalR4
The boundaries check is essentially to verify that a legit rule will fire at some point and allow such a rule. For example, consider the rules [loadByPeriod(P10Y)
, loadByInterval('2020/2025')
]:
- With the boundary check (before and after 10 years from now), the first rule will not contain the second interval. So the validation check will not flag the rule set. The second rule will indeed fire at some point in the future.
- Without the boundary check, the first rule will fully cover the second rule's interval, incorrectly flagging the second rule.
I realize I need to update the javadocs. :)
Co-authored-by: Laksh Singla <[email protected]>
server/src/main/java/org/apache/druid/server/coordinator/rules/PeriodDropBeforeRule.java
Outdated
Show resolved
Hide resolved
@Override | ||
public Interval getEligibleInterval(DateTime referenceTimestamp) | ||
{ | ||
return new Interval(referenceTimestamp.minus(period), referenceTimestamp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return new Interval(referenceTimestamp.minus(period), referenceTimestamp); | |
return new Interval(period, referenceTimestamp); |
server/src/main/java/org/apache/druid/server/coordinator/rules/PeriodLoadRule.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/rules/Rule.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/rules/Rules.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/apache/druid/server/http/RulesResourceTest.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/rules/Rule.java
Outdated
Show resolved
Hide resolved
final Rule currRule = rules.get(i); | ||
final Rule nextRule = rules.get(i + 1); | ||
final Interval currInterval = currRule.getEligibleInterval(now); | ||
final Interval nextInterval = nextRule.getEligibleInterval(now); | ||
if (currInterval.contains(nextInterval)) { | ||
// If the current rule has an eternity interval, it covers everything following it. | ||
// Or if the current rule still covers the next rule at the current interval boundaries, then the | ||
// next rule will never fire at any time, so throw an exception. | ||
if (Intervals.ETERNITY.equals(currInterval) || | ||
(currRule.getEligibleInterval(currInterval.getStart()).contains(nextRule.getEligibleInterval(currInterval.getStart())) | ||
&& currRule.getEligibleInterval(currInterval.getEnd()).contains(nextRule.getEligibleInterval(currInterval.getEnd())))) { | ||
throw InvalidInput.exception( | ||
"Rule[%s] has an interval that contains interval for rule[%s]. The interval[%s] also covers interval[%s].", | ||
currRule, | ||
nextRule, | ||
currInterval, | ||
nextInterval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Rule #3 would never be applied on any segment irrespective of the rule types.
While evaluating retention rules, as soon as a rule applies to a segment, we break and don't look at the other rules.
Also, in general, I am not entirely sure of the validation logic here.
- For one, the set thing suggested by @LakshSingla makes sense. But we would have to ensure that the
fully covered
check is done correctly.- Additive logic (cleaner to implement): Say your set contains intervals
A [4, 10)
andB [15, 20)
and a new intervalC [8, 17)
comes in. Adding this new interval C should merge all of A, B and C into one interval[4, 17)
. - Subtractive logic: To check if an interval has any part which is not fully covered by the preceding intervals, remove all parts of it that overlap with intervals already encountered. If you are left with an empty interval, the rule is not valid.
- Additive logic (cleaner to implement): Say your set contains intervals
- Secondly, all of this is easy to follow when we are dealing with
ForeverXyzRule
orIntervalXyzRule
(rule with fixed interval). ForPeriodXyzRule
, I am not entirely sure of the boundaries that would need to be checked. The proposed change does compute theeligibleInterval
at both start and end of thecurrInterval
but I need to give it some more thought. - There could even be other types of rules in the future, but I guess we need not worry about them now.
...src/main/java/org/apache/druid/server/coordinator/rules/PeriodBroadcastDistributionRule.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/rules/PeriodDropRule.java
Outdated
Show resolved
Hide resolved
To allow for the introduction of new rules and cases where the user does want to add these absurd rules that have been supported till now, WDYT of having a URL parameter |
@LakshSingla, re:
hmm, a parameter to bypass API validation seems a bit hacky to me. If users want to "temporarily" set something for testing or so, they can set effective rules and then always retrieve the "actual" rules from the audit history from the console/API. I think that should suffice? |
ce89122
to
966c902
Compare
This pull request has been marked as stale due to 60 days of inactivity. |
This pull request/issue has been closed due to lack of activity. If you think that |
Druid's retention rules are evaluated in the order in which they are set. Unfortunately, a mistake in the order of rules by an operator can lead to unintentional data loss, and if it goes unnoticed and auto-kill is enabled, data could be permanently deleted.
To address this issue, a guard rail is added in the rules API. This guard rail aims to identify rules that fully contains the interval for any subsequent rules A rule is considered good as long as it will run at some point (and doesn't fully hide other rules in the chain) - it's ensured by checking at the interval boundaries as well.
Some examples of bad rules that will be disallowed (see unit tests for more examples):
loadByPeriod(P6M)
,loadByPeriod(P3M)
]loadByInterval(2020-01-01/2050-01-01)
,loadByInterval(2021-01-01/2023-01-01)
]dropBeforeByPeriod(P3M)
,dropBeforeByPeriod(P6M)
]loadForever
,loadByPeriod(P1M)
]loadForever
,dropForever
]Examples of rules that will be allowed because they will be evaluated at some point:
loadByPeriod(P5Y)
,loadByInterval(2021-01-01/2023-01-01)
]; the loadByInterval rule will be evaluated at some point.loadByPeriod(P1Y)
,loadByPeriod(P2Y)
]For testing purposes, if a user wants to intentionally set a rule that fully contains other rules, they can do it temporarily and then fetch the correct set of rules from the rules audit history after their testing is done.
Release note
Datasource rules API will reject the rules payload if a rule fully contains subsequent rules' interval.
Key changed/added classes in this PR
Rule.java
Rules.java
RulesTest.java
RulesEligibleIntervalTest.java
This PR has: