-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete -naive flag and disallow lookup actions in rules #461
base: main
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #461 will degrade performances by 95.33%Comparing Summary
Benchmarks breakdown
|
// for the later atoms, we consider everything | ||
let mut timestamp_ranges = | ||
vec![0..u32::MAX; cq.query.funcs().collect::<Vec<_>>().len()]; | ||
if do_seminaive { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we still want to keep the -naive
flag as well as the code here, so the user can still do naive evaluation (useful for debugging, also have a different semantics than semi-naive for "unsafe" egglog).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that we should probably keep naive evaluation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should keep it. It is unhelpful and adds complexity to the later passes as they need to support the naive semantics correctly. I am also against keeping it as a use-at-your-own-risk feature.
For Egglog users, if you don't use delete, semi-naive and naive are indistinguishable, so it is unhelpful for debugging. If you use delete, then you care much about performance, and there's no point in using naive. Even when you debug with unsafe features, you should probably debug the semi-naive case instead because that's what you want.
For Egglog developers, I see some value in being a sanity check for ensuring semi-naive is implemented correctly. But we are not doing this now, and it can also be done through stronger end-to-end test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was convincing to me. I've never personally used it before
@yihozhang what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What complexity does this add to later passes? I thought the only difference between seminaive and naive is that in the seminaive case, we split the original query into many small queries depending on the timestamps (i.e., what this code snippet does).
I strongly recommend that we keep the naive evaluation. We can view semi-naive as an optimization of the naive evaluation, and this optimization is not always semantic-preserving, when given bizarre programs that violate certain assumptions. Examples include
- rules that use
extract
/ user-defined primitives - rules where the merge function is not associative or idempotent
I'm also not confident that our semi-naive is implemented correctly- do we really update timestamp every time we update the table? I just looked at table.rs and it seems we don't update the timestamp for at least get_mut
. The naive evaluation serves as a ground truth for this purpose. Personally, when I am debugging a primitive I wrote, the first thing I do is to disable semi-naive evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep the naive flag, either we need to split the latter passes into two, which is unlikely, or each piece of downstream code must support both naive and semi-naive. I am skeptical about the claim that semi-naive code would just work for naive. For one thing, I don't see how semi-naive can be implemented as pure syntactic rewrites. As you pointed out, something more needs to happen to the timestamps in the semi-naive case. And yet, the naive flag is not used anywhere else in the codebase.
There are cases where the two give different semantics. However, the naive semantics is not more helpful to the users in those cases because they still need semi-naive to work in the end.
For your last point: Firstly, you still need to debug your new primitive for semi-naive. Secondly, I will only trust naive evaluation as a ground truth if it is well supported with a clear separation between the two semantics. Relying on your program to be tested to produce the test output is a terrible idea to me.
However, I do think this discussion raised a significant concern about the correctness of Egglog. We should investigate the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conclusion: Keep
- Not comprising the comfort of
-naive
for a smaller core - Too much effort to actually implement
-naive
, we settle for the timestamp hack - Reconsider when merging the new backend
I'm a little worried about the time improvements, especially for lambda... That one is so dramatic I worry that maybe the semantics of the example changed? Seeing all the changes, I also worry about the degradation for UX, it seems just more unwieldy with this change. I know you said that automated desuguring had some issues, but I am wondering if that could be used to at least addressost of these cases? Where there particular issues with it for some cases or just in general? |
src/typechecking.rs
Outdated
//Disallowing Let/Set actions to look up non-constructor functions in rules | ||
for action in head.iter() { | ||
match action { | ||
GenericAction::Let(_, _, Expr::Call(_, symbol, _)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to check if this is a function vs a constructor call here?
tests/eggcc-extraction.egg
Outdated
((set (ival lhs) (IntI n n)))) | ||
(rule ((= lhs (Node (PureOp (Const (IntT) (const) (Num n))))) | ||
(= nval (IntI n n))) | ||
((set (ival lhs) nval))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntI
is a constructor, not a function
So this isn't a lookup and doesn't need to be changed
Bumping the PR for review. Now, it properly deals with Issue #420 by enforcing a write-only RHS for rules through a new type-checking pass that checks every subexpression. This restriction forbids looking up functions even with a default value. This breaks existing tests primarily due to the pervasive use of I also added a few more test cases for this new pass. I'm working on #421 to remove default values and #422 to distinguish constructors and functions. Due to the dependencies, I recommend merging after those two are done in this branch. Known issues:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on this PR! I had some concerns that need to be addressed before we merge this- the main one has to do with relation
desugaring
// for the later atoms, we consider everything | ||
let mut timestamp_ranges = | ||
vec![0..u32::MAX; cq.query.funcs().collect::<Vec<_>>().len()]; | ||
if do_seminaive { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that we should probably keep naive evaluation
} | ||
} | ||
|
||
fn check_lookup_actions(actions: &ResolvedActions) -> Result<(), TypeError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use map_exprs
and/or visit_exprs
here to make this check easier
@@ -20,21 +20,21 @@ | |||
(relation neq (Math Math)) | |||
|
|||
(rule ((neq x y)) | |||
((neq y x))) | |||
((set (neq y x) ()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, now relations have to be set. My opinion is that we shouldn't make users use set
for relations.
I think the long-term plan is for us to desugar relations to be constructors so people don't have to do this. You could either implement that plan in this PR, or desugar relations to a set
for now and do that in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I'll implement the desugaring in this PR. My impression is that it is still up to discussion what it should be desugared into. Performance overhead aside, desugaring into a constructor might not be ideal because that makes it more expressive than needed. For instance, now you can union two edges in the relation or have a relation as the codomain of a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point- we would have to disallow unioning two relation terms.
You can't have a relation as the codomain of a function because the sort would be fresh (no way to get a handle on it)
Relation desugaring aside, I don't think we want to make this breaking change when we plan to fix it right away. I guess that means this PR is blocked by solving the relation issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make functions whose output is Unit
a special case: Such functions have default value ()
and an implicit merge function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would work. I think the plan is to get rid of default values altogether. Desugared or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant getting rid of the default
field/keyword, but handle unit specially (similar to how EqSort is handled specially).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conclusions:
- Conceptually: 3 function subtypes
- Constructor | Relation | Custom
- Operates differently
- Syntax
- Keeping the status quo
- Because there is not enough reason to change it
- The decision is final for the June vision
- Adding constructor keyword
- Only relation if declared as relation
- Only constructor if declared as constructor
- Keeping the status quo
- Semantics/implementation
- Everything still uses the same function data structure under the hood
- Only relation and constructor are allowed on RHS
- It adds the relation/creates the term
- Union requires both arguments to be the same EqSort
- Set requires the head to be a custom function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR!
// for the later atoms, we consider everything | ||
let mut timestamp_ranges = | ||
vec![0..u32::MAX; cq.query.funcs().collect::<Vec<_>>().len()]; | ||
if do_seminaive { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What complexity does this add to later passes? I thought the only difference between seminaive and naive is that in the seminaive case, we split the original query into many small queries depending on the timestamps (i.e., what this code snippet does).
I strongly recommend that we keep the naive evaluation. We can view semi-naive as an optimization of the naive evaluation, and this optimization is not always semantic-preserving, when given bizarre programs that violate certain assumptions. Examples include
- rules that use
extract
/ user-defined primitives - rules where the merge function is not associative or idempotent
I'm also not confident that our semi-naive is implemented correctly- do we really update timestamp every time we update the table? I just looked at table.rs and it seems we don't update the timestamp for at least get_mut
. The naive evaluation serves as a ground truth for this purpose. Personally, when I am debugging a primitive I wrote, the first thing I do is to disable semi-naive evaluation.
@@ -20,21 +20,21 @@ | |||
(relation neq (Math Math)) | |||
|
|||
(rule ((neq x y)) | |||
((neq y x))) | |||
((set (neq y x) ()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make functions whose output is Unit
a special case: Such functions have default value ()
and an implicit merge function.
This PR fixes Issue #420. Lookup actions in rules will now cause a type error
LookupInRuleDisallowed
.Move specifically, this PR:
Removes
-naive
flag and related desugaring code due to being replaced by this change.Fixes 'fail' failing due to not being identified as global in the
remove_global
rewrite pass.Adds new positive and negative tests for this type error.
Rewrites the existing tests for compatibility with the new type error.