Delete -naive flag and disallow lookup actions in rules #461

FTRobbin · 2024-11-06T23:35:56Z

This PR fixes Issue #420. Lookup actions in rules will now cause a type error LookupInRuleDisallowed.

Move specifically, this PR:

Removes -naive flag and related desugaring code due to being replaced by this change.
Fixes 'fail' failing due to not being identified as global in the remove_global rewrite pass.
Adds new positive and negative tests for this type error.
Rewrites the existing tests for compatibility with the new type error.

codspeed-hq · 2024-11-06T23:38:41Z

CodSpeed Performance Report

Merging #461 will degrade performances by 95.33%

_{Comparing haobinni-0904 (8a75e7e) with haobinni-0904 (dc42cd3)}

Summary

❌ 1 regressions
✅ 8 untouched benchmarks
🆕 1 new benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`haobinni-0904`	`haobinni-0904`	Change
❌	`eggcc-extraction`	103.7 ms	2,221.7 ms	-95.33%
🆕	`looking_up_nonconstructor_in_rewrite_good`	N/A	465.1 µs	N/A

yihozhang · 2024-11-06T23:52:32Z

src/gj.rs

            // for the later atoms, we consider everything
            let mut timestamp_ranges =
                vec![0..u32::MAX; cq.query.funcs().collect::<Vec<_>>().len()];
-            if do_seminaive {


I believe we still want to keep the -naive flag as well as the code here, so the user can still do naive evaluation (useful for debugging, also have a different semantics than semi-naive for "unsafe" egglog).

Yeah, I agree that we should probably keep naive evaluation

I don't think we should keep it. It is unhelpful and adds complexity to the later passes as they need to support the naive semantics correctly. I am also against keeping it as a use-at-your-own-risk feature.

For Egglog users, if you don't use delete, semi-naive and naive are indistinguishable, so it is unhelpful for debugging. If you use delete, then you care much about performance, and there's no point in using naive. Even when you debug with unsafe features, you should probably debug the semi-naive case instead because that's what you want.

For Egglog developers, I see some value in being a sanity check for ensuring semi-naive is implemented correctly. But we are not doing this now, and it can also be done through stronger end-to-end test cases.

That was convincing to me. I've never personally used it before
@yihozhang what do you think?

What complexity does this add to later passes? I thought the only difference between seminaive and naive is that in the seminaive case, we split the original query into many small queries depending on the timestamps (i.e., what this code snippet does).

I strongly recommend that we keep the naive evaluation. We can view semi-naive as an optimization of the naive evaluation, and this optimization is not always semantic-preserving, when given bizarre programs that violate certain assumptions. Examples include

rules that use extract / user-defined primitives

rules where the merge function is not associative or idempotent

I'm also not confident that our semi-naive is implemented correctly- do we really update timestamp every time we update the table? I just looked at table.rs and it seems we don't update the timestamp for at least get_mut. The naive evaluation serves as a ground truth for this purpose. Personally, when I am debugging a primitive I wrote, the first thing I do is to disable semi-naive evaluation.

If we keep the naive flag, either we need to split the latter passes into two, which is unlikely, or each piece of downstream code must support both naive and semi-naive. I am skeptical about the claim that semi-naive code would just work for naive. For one thing, I don't see how semi-naive can be implemented as pure syntactic rewrites. As you pointed out, something more needs to happen to the timestamps in the semi-naive case. And yet, the naive flag is not used anywhere else in the codebase.

There are cases where the two give different semantics. However, the naive semantics is not more helpful to the users in those cases because they still need semi-naive to work in the end.

For your last point: Firstly, you still need to debug your new primitive for semi-naive. Secondly, I will only trust naive evaluation as a ground truth if it is well supported with a clear separation between the two semantics. Relying on your program to be tested to produce the test output is a terrible idea to me.

However, I do think this discussion raised a significant concern about the correctness of Egglog. We should investigate the issue.

Conclusion: Keep

Not comprising the comfort of -naive for a smaller core

Too much effort to actually implement -naive, we settle for the timestamp hack

Reconsider when merging the new backend

saulshanabrook · 2024-11-07T15:26:01Z

I'm a little worried about the time improvements, especially for lambda... That one is so dramatic I worry that maybe the semantics of the example changed?

Seeing all the changes, I also worry about the degradation for UX, it seems just more unwieldy with this change.

I know you said that automated desuguring had some issues, but I am wondering if that could be used to at least addressost of these cases? Where there particular issues with it for some cases or just in general?

This reverts commit 35e8532.

oflatt · 2024-11-19T21:49:52Z

src/typechecking.rs

+        //Disallowing Let/Set actions to look up non-constructor functions in rules
+        for action in head.iter() {
+            match action {
+                GenericAction::Let(_, _, Expr::Call(_, symbol, _)) => {


Don't you need to check if this is a function vs a constructor call here?

oflatt · 2024-11-19T21:50:25Z

tests/eggcc-extraction.egg

-      ((set (ival lhs) (IntI n n))))
+(rule ((= lhs (Node (PureOp (Const (IntT) (const) (Num n)))))
+       (= nval (IntI n n)))
+      ((set (ival lhs) nval)))


IntI is a constructor, not a function
So this isn't a lookup and doesn't need to be changed

FTRobbin · 2024-11-24T00:35:25Z

Bumping the PR for review. Now, it properly deals with Issue #420 by enforcing a write-only RHS for rules through a new type-checking pass that checks every subexpression.

This restriction forbids looking up functions even with a default value. This breaks existing tests primarily due to the pervasive use of relation. I have manually rewritten the tests to use set instead.

I also added a few more test cases for this new pass.

I'm working on #421 to remove default values and #422 to distinguish constructors and functions. Due to the dependencies, I recommend merging after those two are done in this branch.

Known issues:

merge functions can cause implicit read
primitives can have side effects (uncertain)

oflatt

Great job on this PR! I had some concerns that need to be addressed before we merge this- the main one has to do with relation desugaring

oflatt · 2024-11-25T19:56:56Z

src/gj.rs

            // for the later atoms, we consider everything
            let mut timestamp_ranges =
                vec![0..u32::MAX; cq.query.funcs().collect::<Vec<_>>().len()];
-            if do_seminaive {


Yeah, I agree that we should probably keep naive evaluation

oflatt · 2024-11-25T19:59:26Z

src/typechecking.rs

+        }
+    }
+
+    fn check_lookup_actions(actions: &ResolvedActions) -> Result<(), TypeError> {


You could use map_exprs and/or visit_exprs here to make this check easier

oflatt · 2024-11-25T20:01:34Z

tests/array.egg

@@ -20,21 +20,21 @@
 (relation neq (Math Math))

 (rule ((neq x y))
-      ((neq y x)))
+      ((set (neq y x) ())))


Oh I see, now relations have to be set. My opinion is that we shouldn't make users use set for relations.

I think the long-term plan is for us to desugar relations to be constructors so people don't have to do this. You could either implement that plan in this PR, or desugar relations to a set for now and do that in another PR.

I don't think I'll implement the desugaring in this PR. My impression is that it is still up to discussion what it should be desugared into. Performance overhead aside, desugaring into a constructor might not be ideal because that makes it more expressive than needed. For instance, now you can union two edges in the relation or have a relation as the codomain of a function.

That's a good point- we would have to disallow unioning two relation terms.
You can't have a relation as the codomain of a function because the sort would be fresh (no way to get a handle on it)

Relation desugaring aside, I don't think we want to make this breaking change when we plan to fix it right away. I guess that means this PR is blocked by solving the relation issue

Let's make functions whose output is Unit a special case: Such functions have default value () and an implicit merge function.

I don't think that would work. I think the plan is to get rid of default values altogether. Desugared or not.

I meant getting rid of the default field/keyword, but handle unit specially (similar to how EqSort is handled specially).

Conclusions:

Conceptually: 3 function subtypes

Constructor | Relation | Custom

Operates differently

Syntax

Keeping the status quo

Because there is not enough reason to change it

The decision is final for the June vision

Adding constructor keyword

Only relation if declared as relation

Only constructor if declared as constructor

Semantics/implementation

Everything still uses the same function data structure under the hood

Only relation and constructor are allowed on RHS

It adds the relation/creates the term

Union requires both arguments to be the same EqSort

Set requires the head to be a custom function

yihozhang

Great PR!

yihozhang · 2024-11-25T23:01:47Z

src/gj.rs

            // for the later atoms, we consider everything
            let mut timestamp_ranges =
                vec![0..u32::MAX; cq.query.funcs().collect::<Vec<_>>().len()];
-            if do_seminaive {


What complexity does this add to later passes? I thought the only difference between seminaive and naive is that in the seminaive case, we split the original query into many small queries depending on the timestamps (i.e., what this code snippet does).

I strongly recommend that we keep the naive evaluation. We can view semi-naive as an optimization of the naive evaluation, and this optimization is not always semantic-preserving, when given bizarre programs that violate certain assumptions. Examples include

rules that use extract / user-defined primitives

rules where the merge function is not associative or idempotent

I'm also not confident that our semi-naive is implemented correctly- do we really update timestamp every time we update the table? I just looked at table.rs and it seems we don't update the timestamp for at least get_mut. The naive evaluation serves as a ground truth for this purpose. Personally, when I am debugging a primitive I wrote, the first thing I do is to disable semi-naive evaluation.

yihozhang · 2024-11-25T23:19:18Z

tests/array.egg

@@ -20,21 +20,21 @@
 (relation neq (Math Math))

 (rule ((neq x y))
-      ((neq y x)))
+      ((set (neq y x) ())))


Let's make functions whose output is Unit a special case: Such functions have default value () and an implicit merge function.

FTRobbin added 7 commits October 23, 2024 12:48

Get rid of semi-naive flag

74999fb

Global lookup tests

3e55b0d

Merge branch 'main' of github.com:egraphs-good/egglog into haobinni-0904

82994eb

Add fail corner case to remove_global

dc69b30

Starting to rewrite tests

dee21e8

Merge branch 'main' of github.com:egraphs-good/egglog into haobinni-0904

9e10fac

Rewrote all failed tests

35e8532

FTRobbin requested a review from a team as a code owner November 6, 2024 23:35

FTRobbin requested review from mwillsey and removed request for a team November 6, 2024 23:35

FTRobbin added 2 commits November 6, 2024 15:42

Minor

a593155

Minor

dc42cd3

yihozhang reviewed Nov 6, 2024

View reviewed changes

FTRobbin added 2 commits November 15, 2024 13:09

Revert "Rewrote all failed tests"

2573722

This reverts commit 35e8532.

Merge branch 'main' of github.com:egraphs-good/egglog into haobinni-0904

c484c92

oflatt reviewed Nov 19, 2024

View reviewed changes

FTRobbin added 11 commits November 22, 2024 10:24

New typechecking pass forbidding lookups

204dd5c

Merge branch 'main' of github.com:egraphs-good/egglog into haobinni-0904

6c8cfbc

Fix array.egg

3ff74c7

Fix combined_nested.egg

3efe333

Fix cykjson.egg

94a64c0

Fix cyk.egg

a18063e

Revert previous fixes to tests

2abdd10

Fixing eggcc-extraction.egg in progress

a040ce9

Fix eggcc-extraction.egg

63fcbff

Fix fusion.egg

4cdeebe

FIx herbie.egg

59334fb

FTRobbin added 27 commits November 22, 2024 12:04

Fix path.egg

dbbf4c8

Fix integer_math.egg

4e34bc6

Fix interval.egg

bc92bad

Fix list.egg

ce7294b

Fix math.egg

d8378a7

Fix path-union.egg

79781cc

Fix pathproof.egg

30fdddd

Fix points-to.egg

693ff70

Fix prims.egg

b6a2466

Fix python_arry_optimize.egg

14aa00d

Fix repro-desugar-143.egg

a9c9969

Fix python_array_optimize.egg again

6c315eb

Fix repro-querybug.egg

97ecc71

Fix repro-unsound.egg

181cd97

Fix rw-analysis.egg

0c0af7a

Fix schedule-demo.egg

755cf92

Fix stratified.egg

078feb0

Fix stresstest_large_expr.egg

34a094c

Fix test-combined.egg

bb09c17

Fix test-combined-steps.egg

ead38d8

Fix tricky-type-checking.egg

a3a7f3c

Fix typeinfer.egg

645d3e3

Fix until.egg

b3768b3

Add negative cases for new typechecking pass

5b95c79

Add more negative cases

fd20ebf

Add negative and positive cases for rewrite

ac48f48

Minor

8a75e7e

oflatt requested changes Nov 25, 2024

View reviewed changes

yihozhang requested changes Nov 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete -naive flag and disallow lookup actions in rules #461

Delete -naive flag and disallow lookup actions in rules #461

FTRobbin commented Nov 6, 2024

codspeed-hq bot commented Nov 6, 2024 •

edited

Loading

yihozhang Nov 6, 2024

oflatt Nov 25, 2024

FTRobbin Nov 25, 2024

oflatt Nov 25, 2024

yihozhang Nov 25, 2024

FTRobbin Nov 26, 2024 •

edited

Loading

FTRobbin Nov 27, 2024

saulshanabrook commented Nov 7, 2024

oflatt Nov 19, 2024

oflatt Nov 19, 2024

FTRobbin commented Nov 24, 2024

oflatt left a comment

oflatt Nov 25, 2024

oflatt Nov 25, 2024

oflatt Nov 25, 2024

FTRobbin Nov 25, 2024

oflatt Nov 25, 2024 •

edited

Loading

yihozhang Nov 25, 2024

FTRobbin Nov 26, 2024

yihozhang Nov 26, 2024

FTRobbin Nov 27, 2024

yihozhang left a comment

yihozhang Nov 25, 2024

yihozhang Nov 25, 2024

Delete -naive flag and disallow lookup actions in rules #461

Are you sure you want to change the base?

Delete -naive flag and disallow lookup actions in rules #461

Conversation

FTRobbin commented Nov 6, 2024

codspeed-hq bot commented Nov 6, 2024 • edited Loading

Merging #461 will degrade performances by 95.33%

Summary

Benchmarks breakdown

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FTRobbin Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saulshanabrook commented Nov 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FTRobbin commented Nov 24, 2024

oflatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oflatt Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yihozhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codspeed-hq bot commented Nov 6, 2024 •

edited

Loading

FTRobbin Nov 26, 2024 •

edited

Loading

oflatt Nov 25, 2024 •

edited

Loading