Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lean compiler can worsen the asymptotic running time of programs #5908

Closed
lyphyser opened this issue Oct 31, 2024 · 2 comments
Closed

Lean compiler can worsen the asymptotic running time of programs #5908

lyphyser opened this issue Oct 31, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@lyphyser
Copy link

lyphyser commented Oct 31, 2024

It has come to my attention that Lean's compiler apparently can make the asymptotic running time of a program worse, in particular applying this transformation (from batteries#1007):

[let x := precomputed; fun y => f x y] will be eta-expanded to fun y => (let x := precomputed; fun y => f x y) y and simplified to fun y => f precomputed y, which means it will not be precomputed at all [here precomputed is an expression that depends on function arguments]

This means e.g. that if the returned function is called O(n) times and precomputed does O(n) work, then the program will now have O(n^2) running time, while it would have O(n) running time with a naive unoptimized compilation strategy where precomputed is only computed once and then stored on the stack slot for the "precomputed" local variable (and then later put in the closure environment).

This is something that is in my opinion a serious bug and not acceptable behavior, since it makes the program potentially unusable (something that in O(n) time runs in a second could take days or even years with O(n^2) running time) and is something that a programmer would never expect, since a production compiler would be extremely careful to never do something like this.

The compiler should be fixed so that it never replaces a reference to a "let" value with the "let" expression itself when there are multiple references or when the references are inside a lambda (or inside a loop if the compiler IR uses loops), unless that expression is constant-time (which e.g. means that when fully elaborated it doesn't invoke recursors for any non-finite inductive) and preferably also short and simple (e.g. not more than K IR or machine instructions with K <= 10).

Furthermore, when simplifying an application of a lambda to the expression itself with variable replacements, the compiler must (at least conceptually) create let bindings instead of directly replacing variables and only simplify them if the condition in the previous paragraph holds. For example, (fun x => fun y => x) expr must be optimized to (let x = expr; fun y => x), and not (fun y => expr), and the let must not be optimized away, unless the lambda is also optimized away beforehand.

To be more explicit, note that while mathematically those transformations would be valid in typed lambda calculus, when compiling code it is definitely not OK to replace a function with any extensionally equivalent function, since that replacement function may be unusably slow or even just worse than the one the programmer specified. Instead, the replacement can only be performed if the compiler is sure that asymptotic running time is equal or better, and should only be performed if the compiler also has a good-faith belief that the constant factors will probably be equal or smaller on the target architecture/CPU (i.e. the replacement is probably an optimization and not a pessimization)

@nomeata
Copy link
Collaborator

nomeata commented Oct 31, 2024

Thanks for your bug report, and the elaborate explanation of what a compiler should and should not do, very helpful to those who may not be aware of these issues.

The report would be more useful with a MWE.

Furthermore we are aware of a number of issues with the current code generator, which is slated to be replaced in the future, and thus frozen. Did you look through existing bug reports for duplicates, in particular those labeled “depends on new code generator”?

@nomeata nomeata closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2024
@lyphyser
Copy link
Author

lyphyser commented Nov 1, 2024

If Lean is to be usable as a general-purpose programming language, issues like these need to be fixed timely (or, for this particular issue, not introduced in the first place).

Not fixing serious issues because you are expecting to rewrite the component only makes sense if the new code generator is being finished shortly; since it seems it's merely planned, it makes sense to first fix the current system.

Note that usually rewriting from scratch is considered bad practice (it throws away all the work and there is no guarantee that new system is actually better than the old one), so as you start actually working on the new code generator you might realize it's better to refactor the existing one, and thus fixing the current generator is not necessarily going to be wasted work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants