Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

free-identifier=? treats uninterned symbols as if interned for the fallback comparison #902

Open
dpk opened this issue Jan 17, 2025 · 7 comments

Comments

@dpk
Copy link

dpk commented Jan 17, 2025

Although datum->syntax appears to preserve the uninternedness of identifiers’ symbolic names, free-identifier=? ignores it:

> (free-identifier=? (datum->syntax #'bar (string->uninterned-symbol "hello"))
                     (datum->syntax #'foo (string->uninterned-symbol "hello")))
#t
> (eq? (syntax->datum (datum->syntax #'foo (string->uninterned-symbol "hello"))) 'hello)
#f

This is rather unintuitive. Symbols never compare by their textual name anywhere else, always by their location in the store. This goes against the raison d’être of the fallback case for free-identifier=?, which is that the two identifiers would have the same binding if defined at the top level. It also likely prevents or makes significantly trickier the use of uninterned symbols as a mechanism for communication between macro expansions, though I haven’t actually experimented with that yet. (This issue was discovered in the course of a discussion about potentially introducing uninterned symbols to R7RS large, where they would mostly be available for this purpose.)

@mnieper
Copy link
Contributor

mnieper commented Jan 19, 2025

This has not much to do with the semantics of free-identifier=? but all the more with the semantics of the top-level. In the example given, Chez's top level does not compare the identifiers by name but by their actual top-level bindings (and top-level identifiers are bound to a location corresponding to their written name).

Run the example in an R6RS program, and you'll see that it works.

@mflatt
Copy link
Contributor

mflatt commented Jan 19, 2025

The current behavior does seem wrong. I've looked into this some, but not yet enough to propose a change.

@mnieper
Copy link
Contributor

mnieper commented Jan 19, 2025

The current behavior does seem wrong. I've looked into this some, but not yet enough to propose a change.

The current behaviour of free-identifier=? is correct. The R6RS program

(import (chezscheme))

(pretty-print
  (free-identifier=?
    (datum->syntax #'bar (string->uninterned-symbol "hello"))
    (datum->syntax #'bar (string->uninterned-symbol "hello"))))

prints #f as expected by @dpk and as needed for @dpk's use case.

When run at the top-level, i.e. when running the program

(import (chezscheme))

(pretty-print
  (eval
    '(free-identifier=?
       (datum->syntax #'bar (string->uninterned-symbol "hello"))
       (datum->syntax #'bar (string->uninterned-symbol "hello")))
    (interaction-environment)))

we get #t (as observed by @dpk's post), but this is also the correct answer for free-identifier=? because of the way the interaction environment works:

The program

(import (chezscheme))

(define id1 (string->uninterned-symbol "hello"))
(define id2 (string->uninterned-symbol "hello"))
(set-top-level-value! id1 "hi" (interaction-environment))
(pretty-print (top-level-value id2))

prints "hi" because id1 and id2 are bound by their printable name in the interaction environment.

This is why I wrote above that free-identifier=? works exactly as expected and that @dpk's example code is misleading because it is about how the interaction environment works.

Thus, if any behaviour is to be changed (I am not sure whether it should), it would be the behaviour of the interaction environment.

@mnieper
Copy link
Contributor

mnieper commented Jan 19, 2025

The interaction environments (and other environments) use gensym->unique-string as a key for their (conceptual) binding table, see here https://github.com/cisco/ChezScheme/blob/main/s/syntax.ss#L455.

@dpk is using the procedure string->uninterned-symbol, which sets the "unique string" directly; had she used gensym just with the pretty name "hello", everything would have worked.

In fact, the program

(import (chezscheme))

(pretty-print
  (eval
    '(free-identifier=?
       (datum->syntax #'bar (gensym "hello"))
       (datum->syntax #'bar (gensym "hello")))
    (interaction-environment)))

prints #f.

@mnieper
Copy link
Contributor

mnieper commented Jan 19, 2025

[...] It also likely prevents or makes significantly trickier the use of uninterned symbols as a mechanism for communication between macro expansions, though I haven’t actually experimented with that yet. (This issue was discovered in the course of a discussion about potentially introducing uninterned symbols to R7RS large, where they would mostly be available for this purpose.)

Let me finally point out that this proposed use of gensyms leads to, at least, unintuitive macros:

Please read secret-id in the following program as a gensym.

(import (chezscheme))

(define-syntax put
  (lambda (stx)
    (syntax-case stx ()
      [(k x)
       (with-implicit (k secret-id)
         #'(define secret-id x))])))

(define-syntax get
  (lambda (stx)
    (syntax-case stx ()
      [(k)
       (with-implicit (k secret-id)
         #'secret-id)])))

(let ()
  (put 42)
  (pretty-print (get)))

As expected, the program prints 42 and, if I understand you correctly, demonstrates the "macro communication mechanism" you are referring to.

However, it is not transparent to syntactic abstraction. For whatever reason, the user might want to introduce a convenience macro, ultimately expanding to the first of the two macros above:

(let-syntax
  ([my-put
    (syntax-rules ()
      [(_ x) (put x)])])
  (my-put 42)
  (pretty-print (get)))

And this macro breaks. (Incidentally, this particular example wouldn't have broken if get had been wrapped instead of put, but this is due to the peculiarities of internal defines, which is related to the non-symmetry of the relation whether binding one identifier binds the other.)

Another problem with the approach of "invisible identifiers" is that it breaks down when the user uses the import-only form as in the following snippet:

(module env (pretty-print get))
(let ()
  (put 42)
  (import-only env)
  (pretty-print (get)))

Intuitively, it should work but doesn't. But even if it shouldn't work, the user has no way to repair it because the secret identifier is, well, secret and cannot be put into the module exports.

Syntax parameters, on the other hand, do not have any of these problems, but their effect is more global (and, as with any form of dynamic binding, one has to program carefully).

@dpk
Copy link
Author

dpk commented Jan 19, 2025

And this macro breaks.

What you refer to as ‘broken’ is the intended behaviour of my use case.

@mnieper
Copy link
Contributor

mnieper commented Jan 19, 2025

And this macro breaks.

What you refer to as ‘broken’ is the intended behaviour of my use case.

I understand that this can be the intended behaviour in some cases (i.e. when a put/get pair is independently introduced by two unrelated macro pairs). The intention of my example was to show that what can be the intended behaviour for some use cases can be quite unintuitive in other cases (and can lead to code where syntactic abstraction doesn't work anymore).

Note that

(let-syntax
  ([my-get
    (syntax-rules ()
      [(_) (get)])])
  (put 42)
  (pretty-print (my-get)))

does not give an error, so whether the macro "breaks" or not depends on whether get or put is wrapped. Is this really the intended behaviour?

Maybe it is (or the example does not apply to your use case), but in any case, we should probably move this discussion somewhere else as it is not directly related to the reported issue. As far as the reported issue is concerned, it seems to me that it can be closed. According to the documentation, "uninterned symbols" (those produced by string->uninterned-symbol), which are not gensyms, by the way, seem to be a leftover from earlier Chez versions. When you use "the modern" gensym instead, you don't have to make sure to give different unique names for each invocation; when you give the same unique name, you get the same gensym. In other words, gensyms are interned but with all semantics of uninterned symbols that you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants