Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let blocks #2010

Merged
merged 19 commits into from
Sep 10, 2024
Merged

Let blocks #2010

merged 19 commits into from
Sep 10, 2024

Conversation

jneem
Copy link
Member

@jneem jneem commented Jul 25, 2024

This implements let blocks, but only for "plain" bindings that don't do any destructuring.

I looked into handling destructuring. As long as we don't need to allow recursive bindings, I think destructuring can be handled by desugaring into a match block that matches an array of all the right hand sides. I.e.,

let
  a = 1,
  { foo } = { foo = 2 }
in ...

becomes

[1, { foo = 2 }] |> match { [a, { foo }] => ... }

Recursive let blocks seem to be rather more tricky with the current pattern compilation strategy.

@github-actions github-actions bot temporarily deployed to pull request July 25, 2024 15:15 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 25, 2024 15:40 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 25, 2024 15:54 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 25, 2024 16:19 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 26, 2024 06:08 Inactive
@jneem jneem marked this pull request as ready for review July 26, 2024 06:15
@jneem jneem requested a review from yannham July 26, 2024 06:15
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting to ball rolling on this long awaited feature 🙂 as far as I can tell the code looks good, but I have a couple reservations at the time being at the design level:

  • We're trading the let-binding structure for something more complicated (a hashmap), even for a single-var let-binding, which I suppose is the most common case. Do we have an idea of the cost of that (maybe that will be easier to answer thanks to your other PR on continuous benchmarks!). Subsidiary question: do we need a hashmap here? I would rather expect a Vec, given that we are going to iterate through them and insert them in the environment.
  • I think we should first try to pause and think about the following possible combinations: normal let block as implemented in this PR, let rec block (but still without pattern), let blocks with patterns and let rec blocks with patterns. In particular, what's the semantics and the possible desugaring/compilation of those. And more importantly, is each combination meaningful? Or can we get away with restricting the possibilities? I'm fine restricting let-blocks, but I don't think the fact that it wouldn't work well with how we compile pattern currently is a good motivation, without having either a plan B for how it could work in the future, or a good reason to rule let rec blocks with patterns because they're expected to be much less useful.

@jneem
Copy link
Member Author

jneem commented Jul 30, 2024

We're trading the let-binding structure for something more complicated

Oh, good point. I think a Vec should be fine, and probably even a SmallVec or similar. I'll benchmark the options.

I think it isn't too difficult to support let blocks with patterns, so let me try to do that. If that works, the forbidden combination would be rec + pattern, which is a bit arbitrary but at least it's easy to explain. Ultimately, it would be nice to just not have any restrictions at all. I don't think it's hard to fix it, it will just take some work. The main issue I ran into is that pattern compilation for eval doesn't make it easy to figure out what the bound names are, and therefore we can't insert them into the recursive environment. But the logic for statically finding out the bound names is already in the typechecker, so it should be mostly a matter of reorganizing things...

@github-actions github-actions bot temporarily deployed to pull request July 31, 2024 11:08 Inactive
@jneem
Copy link
Member Author

jneem commented Jul 31, 2024

Ok, I've added support for patterns in let blocks.

I also measured the performance. It looks like the IndexMap hurt performance a bit, and changing it to SmallVec is as fast as the original. It's a little hard to be sure, since the benchmarks are pretty noisy.

@github-actions github-actions bot temporarily deployed to pull request July 31, 2024 11:31 Inactive
@yannham
Copy link
Member

yannham commented Jul 31, 2024

So, the semantics of let pat1 = val1, ..., patn = valn in body is let [pat1, ..., patn] = [val1, ..., valn]. I think that's fair, although probably a bit wasteful as an actual desugaring. But it's ok for a new feature. It's also likely very easy to compile to a sequence of patterns directly in a more efficient manner without allocating, building and destructuring an array (I'm thinking implementing a variant of Compile for something wrapping [Pattern] and taking a sequence of arguments, or something like that).

The main issue I ran into is that pattern compilation for eval doesn't make it easy to figure out what the bound names are, and therefore we can't insert them into the recursive environment. But the logic for statically finding out the bound names is already in the typechecker, so it should be mostly a matter of reorganizing things...

I think pattern compilation is already doing something close to what you want. It basically compiles a pattern - given the term to match on - to an expression that evaluates to null or to a record of the bindings introduced by the pattern. The final step is to call %pattern_branch% or sth that evaluates the branch in an environment augmented with the bindings. We can change this last step to something more involved. But I fear this is going to be more complicated, because before we can bind any of the pattern variable in the environment, we first have to evaluate part of the matched value. I suspect we'll need something similar to what we do for recursive records, that is patching the environment afterwards to "tie the knot" properly.

In general, for recursive let-blocks with patterns, both the desugaring and the possible compilation aren't entirely crystal clear to me right now. Looking at a single recursive destructuring let could be interesting, something like let {x=a, y=b} = ...whatever uses a and b.... in body. Once again, before even being able to build the recursive environment for a and b, you first need to evaluate partially the body first. Maybe we can fill the environment with "bombs" initially, like InfinteRecursion errors (because obviously if we need a before we can even bind a, this isn't going to work). But then, we will bind a to a closure with those bombs in the environment, and we'll then need to patch them later with the actual recursive environment. As far as I can tell, this requires a bit more machinery that what's currently available. Or at least to repurpose part of the recursive records' machinery to that end.

@jneem
Copy link
Member Author

jneem commented Jul 31, 2024

What about let rec {x=a, y = {z = c}} = foo in bar becomes

let rec
  %0 = foo,
  a = %0.x,
  %1 = %0.y,
  c = %1.z
in
bar

I think this extends to handle arrays pretty easily. I'm less sure about constants and ADTs. Maybe those could be handled by extending the "primitive" (i.e. non-pattern) let term to allow constants or patterns of the form 'Foo x

@yannham
Copy link
Member

yannham commented Aug 1, 2024

What about let rec {x=a, y = {z = c}} = foo in bar becomes

let rec
 %0 = foo,
 a = %0.x,
 %1 = %0.y,
 c = %1.z
in
bar

So, if we have {x=[{y=[c]}]]}, we'll need to have c as a top-level binding I guess. It seems it's like pattern matching except that you bubble up all the bound variables as a top-level let-binding.

I think this illustrates my point: we now have to (re?)define the semantics of patterns for both pattern matching and destructuring separately, and also probably have separate implementations, as we used to, beyond the fact that I trust you can find a reasonable semantics for recursive destructuring. But as before, they might diverge or have different evaluation strategy, which can be surprising. It's not necessarily a no-go, but it's not very satisfying. I just wondered if could unify both in one core compilation scheme (even if there recursive side is actually never used in actual match expression but only in destructuring).

I think what the pattern matching compilation is doing is pretty close in spirit already. It just does that in pure Nickel code, by putting stuff in a record in several steps instead of bubbling up all the bindings at the top-level. One possibility is to make pattern compilation more like what you propose. Another is to do what I propose, and patch the result at the end. The issue with the latter is in case of infinite recursion (the current pattern matching compilation would cause an unbound variable error instead), but I think putting "bombs" in the environment and shadowing them at the end once we can build the recursive environment is actually maybe not that bad.

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a more thorough review. It looks good, I'm just requesting changes to block merging because I think we shouldn't include this in 1.8, which is too close.

I think it could have more tests, in particular for typechecking recursive let-blocks or let-blocks with patterns.

It's also a bit annoying that we often have to do two traversals of the bindings just for the recursive case, as mentioned in a comment. But I guess we can live with it.

core/src/parser/utils.rs Outdated Show resolved Hide resolved
core/src/parser/utils.rs Show resolved Hide resolved
core/src/pretty.rs Show resolved Hide resolved
core/src/transform/desugar_destructuring.rs Outdated Show resolved Hide resolved
core/src/transform/free_vars.rs Outdated Show resolved Hide resolved
lsp/nls/src/position.rs Outdated Show resolved Hide resolved

for (_, val) in bindings {
self.fill(val, if attrs.rec { &new_env } else { env });
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not going to have any import consequence, but it's a bit annoying that we have to do those double iterations each time just in case the let is recursive. One possibility would be to distinguish both cases in the match, and have a different implementation, which leads obviously to a bit of code duplication. Or adds if wherever needed (to do everything in the first for and elide the second in the non-recursive case), but I fear this will make the code not very readable. So I don't have a great idea nor a strong opinion. It's just... itchy 😛

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I definitely agree. This was the clearest version I could come up with, even though it isn't the most efficient.

@github-actions github-actions bot temporarily deployed to pull request August 5, 2024 10:20 Inactive
Copy link
Contributor

github-actions bot commented Aug 5, 2024

🐰Bencher

ReportTue, September 10, 2024 at 14:44:20 UTC
Projectnickel
Branch2010/merge
Testbedubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!

  • Latency (latency)

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results
BenchmarkLatencyLatency Results
nanoseconds (ns)
fibonacci 10➖ (view plot)498,870.00
pidigits 100➖ (view plot)3,224,700.00
product 30➖ (view plot)817,630.00
scalar 10➖ (view plot)1,497,000.00
sum 30➖ (view plot)803,420.00

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

@jneem
Copy link
Member Author

jneem commented Aug 6, 2024

Yeah, that's fair. It does bother me a little that there are some easy cases that we don't cover. Like, I can write

let rec
  a = { foo = 3 }
  b = a.foo + 1
in b

but then if I want to refactor it to

let rec
  { foo } = { foo = 3 }
  b = foo + 1
in b

then I need to break up the let block into a sequence of lets. But I agree, if we don't have good semantics in general then it doesn't make sense to add it.

@jneem
Copy link
Member Author

jneem commented Aug 6, 2024

Ok, I tried writing tests and there are a bunch of problems with typechecking recursive let blocks. We'll need to do something more like what recursive records do, but I need to understand that first...

@jneem jneem marked this pull request as draft August 6, 2024 04:37
@yannham
Copy link
Member

yannham commented Aug 6, 2024

Yeah, that's fair. It does bother me a little that there are some easy cases that we don't cover. Like, I can write

One possibility would be to rule out more complicated cases, because it's not obvious that they are really useful in practice. So, for example, we could require that the right hand side of a destructuring let rec must evaluate to a WHNF without needing the recursive bindings. Then we could do the poisoning approach without caring about the order.

Otherwise, I reckon the right semantics for recursive destructuring is probably the old one where it's just desugared into a sequence of lazy let-bindings. However, I think there were non trivial issues around default values, lazyness (something like let 'Foo 5 = value in ... would be transparent because the pattern doesn't bind anything, while today it's desugared to value |> match { 'Foo 5 => ...} which does check that value is 'Foo 5), and error reporting. I just wonder if recursive destructuring is useful enough to 1) go back to this more complicated situation with discrepancies between pattern matching and destructuring, and more line of code in the implementation 2) breaking backward compatibility (again)

@github-actions github-actions bot temporarily deployed to pull request August 7, 2024 09:36 Inactive
@github-actions github-actions bot temporarily deployed to pull request August 7, 2024 10:27 Inactive
@jneem jneem marked this pull request as ready for review August 7, 2024 10:45
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some tests for the LSP as well? Also, we need to update the documentation as well, at least the syntax.

Because of the doc thing, and that I don't want to ship a last-minute new feature in the next version, I'm going to request changes, but it's just for the sake of delaying the merging of this PR.

Otherwise, we might have a look at what other languages do - I realized, while discussing with other people from the group, that Haskell does have something called lazy patterns that might be the semantics we want for destructuring. And they have a form of destructuring (although it's just desugared to pattern matching, so I suspect it's not going to support any form of recursivity). I don't know much about Scala, etc. But there might be inspiration there.

core/src/typecheck/mod.rs Show resolved Hide resolved
@github-actions github-actions bot temporarily deployed to pull request August 8, 2024 03:21 Inactive
@jneem jneem mentioned this pull request Aug 28, 2024
@jneem jneem mentioned this pull request Sep 6, 2024
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beside the last remarks, now that we have good options for the recursive case and that 1.8 has been published, I think we can proceed.

Left for future work I guess:

  • Update Topiary to format let-blocks properly
  • Implement pattern compilation for [Pattern], generating a bunch of variables statically instead of needing to allocate a Nickel array to immediately destructure it
  • Update the various Nickel code (stdlib, examples, etc.) to take advantage of let blocks, to lead by example

cli/tests/snapshot/inputs/eval/let_block_rec.ncl Outdated Show resolved Hide resolved
/// - `(["a", "foo"], "foo", empty field)` for the `foo` variable
/// - `(["a", "bar"], "z", field with Number contract)` for the `z` variable
/// - `(["d"], "e", empty field)` for the `e` variable
fn bindings(&self) -> Vec<(Vec<LocIdent>, LocIdent, Field)>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the code of this module taken from the LSP or something? I feel like I've seen this code somewhere else before. If yes, has it been removed from the LSP?But maybe it's memories from my previous review 🤷‍♂️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it actually originated in core, but at some point you moved it to the LSP. I did indeed forget to delete the duplicate copy, so I've done that now.

@github-actions github-actions bot temporarily deployed to pull request September 10, 2024 14:37 Inactive
@jneem
Copy link
Member Author

jneem commented Sep 10, 2024

The good news is that the wasteful array desugaring is already gone in #2031. I got let blocks supported in tree-sitter-nickel a while back, so I can look at supporting that in topiary too

@jneem jneem enabled auto-merge September 10, 2024 14:40
@jneem jneem added this pull request to the merge queue Sep 10, 2024
Merged via the queue into master with commit 58412e8 Sep 10, 2024
7 checks passed
@jneem jneem deleted the let-blocks branch September 10, 2024 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants