Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Rule Matching #16888

Open
kaushalmahi12 opened this issue Dec 20, 2024 · 5 comments
Open

[Proposal] Rule Matching #16888

kaushalmahi12 opened this issue Dec 20, 2024 · 5 comments
Assignees
Labels
Meta Meta issue, not directly linked to a PR

Comments

@kaushalmahi12
Copy link
Contributor

kaushalmahi12 commented Dec 20, 2024

Author: Kaushal Kumar
Pre-read for this issue: #16797

Please describe the end goal of this project

This document covers the Rule Matching for incoming search requests in depth. After reading this document, reader will have sound understanding of the following

  1. How the request attributes are extracted ?
  2. How exactly does the matching algorithm works on the extracted attributes ?
  3. What are the new constructs and how they work with each other ?

Assumptions

  • Rule attributes (cardinality) will be limited in nature at feature level and aggregated count.
    • When creating Rules the control will first go to the feature to do the
      • Cardinality validation
      • Attributes validation
  • Each feature will define their own attributes which the feature will be responsible for validating.

New Rule insertion process

Rule schema

{
    
   "attribute1": ["value*"],
   "attribute2": ["value*"],
   "label": "fjagjag9243421_425285",
   "updatedAt": "12-03-2024T18:00:23Z",
   "feature": "WLM"
}

New rule insertion works pretty much how insertion in compressed trie works. But given each attribute can take N(this will be limited e,g; 5 at max) number there is a chance that two or more rules could share one of the values in their respective attribute values list.
In this case whichever rule was created or modified last will override the label for that attribute in the trie.

Attribute extraction process

Given that the Rule attributes are specific to a feature, feature will provide a construct to extract the attributes. For example WLM feature would use threadContext to retrieve the authN info and request itself to retrieve the indices.

Rule Matching Process

In a nutshell the rule matching will be per attribute trie (prefix tree) based matching. Hence ultimately we will be dealing with a list of values per attribute. How this per attribute based list is calculated, we will understand that with the help of some examples
RulesMatching

lets say above picture captures a subpart of the in-memory rules Trie. Now following scenarios can occur while calculating possible labels for an attribute value.

Scenario - 1

When the search for the attribute value ends on a node which has a label

search_value: log_q3_2024

In this case the resulting label would be abc.

Scenario - 2

When the search ends on a node which doesn’t have a label then there are following sub scenarios

  • There is an empty prefix match for this value i,e the search ends at the root node which holds the empty value ( e,g; order_q3_2024)
    • In this case we assign nothing and downstream systems can treat this as default behavior
  • There is a partial match, then it should return top 10 results from the subtree. The order is decided by depth of the node.
    • for example for search string logs_q2_2024 , in this case it would return [abc, pqr]

Now the above run down shows how the labels are decided per attribute value but given the request carries multiple values and rule also contains multiple attributes, how to calculate the final value. Now we will select the one which is occurring in all attribute based return label lists .

Lets say for a feature based label evaluation we are considering two attributes and corresponding labels as per the above algorithm are [abc, xyz] and [abc,pqr] now the resulting label here will be abc for this request.

Even with multiple lists following scenarios can occur

  1. All lists have nothing in common ==>> In this case the request will be treated with a default label
  2. All lists have only one value in common ==>> This will be the resultant label for the request
  3. All lists have multiple values in common ==>> In this case to decide the resulting label. we will consider the depth as the tie breaker i,e; label with the min depth will be selected.

Low Level Design

Following UML and interaction diagrams shows the new constructs, their relationships and how they interact with each other.
RuleMatchingUML
RulesSequenceDiagram
InMemoryRuleEngine will save the label in ThreadContext which downstream transport actions can consume.

Supporting References

Issues

#16797

Related component

Search

@kaushalmahi12
Copy link
Contributor Author

@reta @msfroh @jainankitk
Can you review this and provide your suggestions ?

@msfroh
Copy link
Collaborator

msfroh commented Jan 3, 2025

How the request attributes are extracted ?

I still don't understand how attributes are extracted from a SearchRequest. What are the attributes of a SearchRequest?

@msfroh
Copy link
Collaborator

msfroh commented Jan 3, 2025

This proposal is very abstract. I would like some concrete examples to understand what's going on.

@kaushalmahi12
Copy link
Contributor Author

kaushalmahi12 commented Jan 3, 2025

@msfroh Thanks for reviewing this!

I still don't understand how attributes are extracted from a SearchRequest. What are the attributes of a SearchRequest?

Since it will be hard to contrive all the features and their feature specific attributes that they will use, I have kept that attribute extraction part abstract. But for WLM we will be using user principle and indices to determine a label, we can use SearchRequest#indices method to get the request indices and ThreadContext#getHeader to get the user principle info.

I would like some concrete examples to understand what's going on.

Can you provide more details about it ? I am not sure whether you got a chance to review #16797 . But if the mentioned issue still doesn't fill the gap, I can answer here.

@dblock dblock removed the untriaged label Jan 6, 2025
@dblock
Copy link
Member

dblock commented Jan 6, 2025

[Catch All Triage - 1, 2, 3, 4, 5, 6]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta Meta issue, not directly linked to a PR
Projects
Status: New
Development

No branches or pull requests

3 participants