Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull in updated version of SANY to support Unicode specs #2995

Open
ahelwer opened this issue Sep 19, 2024 · 26 comments
Open

Pull in updated version of SANY to support Unicode specs #2995

ahelwer opened this issue Sep 19, 2024 · 26 comments
Assignees
Labels
feature A new feature or functionality

Comments

@ahelwer
Copy link

ahelwer commented Sep 19, 2024

Now that SANY supports parsing Unicode TLA+ specs, it should be fairly straightforward to support Unicode here by updating to a newer version of SANY. Possible bugs include if your translation code looks at actual string representation (for example, "\\E" instead of TLAplusParserConstants.EXISTS) which was the case in SANY itself.

@ahelwer ahelwer added the feature A new feature or functionality label Sep 19, 2024
@konnov konnov self-assigned this Sep 20, 2024
@konnov
Copy link
Collaborator

konnov commented Sep 23, 2024

Possible bugs include if your translation code looks at actual string representation (for example, "\\E" instead of TLAplusParserConstants.EXISTS) which was the case in SANY itself.

Oh, we have a bunch of those :)

val simpleOpcodeToTlaOper: Map[String, TlaOper] = Map(
("=", TlaOper.eq),
("/=", TlaOper.ne),
("'", TlaActionOper.prime),
("\\lnot", TlaBoolOper.not),
("\\lor", TlaBoolOper.or),
("\\land", TlaBoolOper.and),
("\\equiv", TlaBoolOper.equiv),
("=>", TlaBoolOper.implies),
("SUBSET", TlaSetOper.powerset),
("UNION", TlaSetOper.union),
("DOMAIN", TlaFunOper.domain),
("\\subseteq", TlaSetOper.subseteq),
("\\in", TlaSetOper.in),
("\\notin", TlaSetOper.notin),
("\\", TlaSetOper.setminus),
("\\intersect", TlaSetOper.cap),
("\\union", TlaSetOper.cup),
("$CartesianProd", TlaSetOper.times),
("~>", TlaTempOper.leadsTo),
("[]", TlaTempOper.box),
("<>", TlaTempOper.diamond),
("ENABLED", TlaActionOper.enabled),
("UNCHANGED", TlaActionOper.unchanged),
("\\cdot", TlaActionOper.composition),
("-+->", TlaTempOper.guarantees),
("$AngleAct", TlaActionOper.nostutter),
("$SquareAct", TlaActionOper.stutter),
("$Tuple", TlaFunOper.tuple),
("$Pair", TlaFunOper.tuple),
("$ConjList", TlaBoolOper.and),
("$DisjList", TlaBoolOper.or),
("$Seq", TlaFunOper.tuple),
("$FcnApply", TlaFunOper.app),
("$IfThenElse", TlaControlOper.ifThenElse),
("$RcdSelect", TlaFunOper.app),
("$SetEnumerate", TlaSetOper.enumSet),
("$SetOfFcns", TlaSetOper.funSet),
("$SF", TlaTempOper.strongFairness),
("$WF", TlaTempOper.weakFairness),
)

@konnov
Copy link
Collaborator

konnov commented Sep 23, 2024

instead of TLAplusParserConstants.EXISTS

Do you know how stable this table of constants going to be? It's autogenerated. I assume that the names will not change most likely?

@konnov
Copy link
Collaborator

konnov commented Sep 23, 2024

@ahelwer do I understand it right that the unicode operators have their own names such as op_infix_neq_uc? I wonder where to get the actual mapping from without guessing.

@ahelwer
Copy link
Author

ahelwer commented Sep 23, 2024

Do you know how stable this table of constants going to be? It's autogenerated. I assume that the names will not change most likely?

It has been stable for about 10 years as far as I know, no plans to change it in the future.

do I understand it right that the unicode operators have their own names such as op_infix_neq_uc

No, they do not. They are all wrapped up into the same TLAplusParserConstants token IDs.

@ahelwer
Copy link
Author

ahelwer commented Sep 23, 2024

If you want a quick shortcut you could also just add all the different text-based unicode variants, defined here: https://github.com/tlaplus/tlaplus-standard/blob/main/unicode/tla-unicode.csv

Although this might make it more work if any of these mappings are ever changed.

@konnov
Copy link
Collaborator

konnov commented Sep 27, 2024

I find TLAplusParserConstants quite confusing. It contains an array called tokenImage:

https://github.com/tlaplus/tlaplus/blob/f021674983390022709009474d39021fe16e0ee1/tlatools/org.lamport.tlatools/src/tla2sany/parser/TLAplusParserConstants.java#L300

I wrote a simple Python script to do lookups in this table (see below). Surprisingly, the table contains codes both for /= and #. I guess, one of them is getting normalized somewhere in SANY?

The Python script find-token.py:

#!/usr/bin/env python

import sys

pattern = sys.argv[1]

all_tokens = ["<EOF>", "<BEGIN_MODULE>", "\"--->\"", "<_BM1>", "<token of kind 4>", "<CASE0>", "<CASE1>", "<CASE1b>", "<CASE1c>", "<CASE2>", "<CASE2b>", "<CASE2c>", "<CASE3>", "<CASE6>", "<CASE6b>", "<CASE6c>", "<CASEN>", "<LETTER>", "<DIGIT>", "<NUMBER>", "<_BM2>", "<token of kind 21>", "\" \"", "\"\\t\"", "\"\\n\"", "\"\\r\"", "<token of kind 26>", "\"\\\\*\"", "<token of kind 28>", "\"*)\"", "\"*)\"", "<token of kind 31>", "<token of kind 32>", "<token of kind 33>", "<_BM0>", "<SEPARATOR>", "<END_MODULE>", "<ACTION>", "\"ASSUME\"", "\"[]ASSUME\"", "<ASSUMPTION>", "\"CASE\"", "\"CHOOSE\"", "<CONSTANT>", "\"ELSE\"", "\"EXCEPT\"", "<EXISTS>", "\"EXTENDS\"", "<FORALL>", "\"IF\"", "\"INSTANCE\"", "\"LET\"", "\"IN\"", "\"LOCAL\"", "\"MODULE\"", "\"NEW\"", "\"OTHER\"", "<PROPOSITION>", "\"SF_\"", "\"\\\\EE\"", "\"\\\\AA\"", "\"THEN\"", "\"BY\"", "\"ONLY\"", "\"DEFINE\"", "<DF>", "\"THEOREM\"", "\"USE\"", "\"HIDE\"", "\"HAVE\"", "\"OBVIOUS\"", "\"OMITTED\"", "\"LAMBDA\"", "\"TAKE\"", "\"PROOF\"", "\"PROVE\"", "\"[]PROVE\"", "\"QED\"", "\"RECURSIVE\"", "\"STATE\"", "<TEMPORAL>", "\"PICK\"", "\"WITNESS\"", "\"SUFFICES\"", "<VARIABLE>", "\"WF_\"", "\"WITH\"", "\",\"", "\":\"", "\"::\"", "\".\"", "\"_\"", "\"==\"", "\"(\"", "\")\"", "\"-|-\"", "\"[\"", "\"]_\"", "\"]\"", "\"{|\"", "\"|}\"", "\"{\"", "\"}\"", "\"<<\"", "\">>_\"", "\">>\"", "\"!\"", "\"->\"", "\"<-\"", "\"|->\"", "<NUMBER_LITERAL>", "<STRING_LITERAL>", "<BAND>", "<BOR>", "\"\\'\"", "\"^+\"", "\"^*\"", "\"^#\"", "\"-.\"", "\"\\\\lnot\"", "\"\\\\neg\"", "\"~\"", "\"[]\"", "\"<>\"", "\"ENABLED\"", "\"UNCHANGED\"", "\"SUBSET\"", "\"UNION\"", "\"DOMAIN\"", "\"//\"", "\"/\\\\\"", "\"/=\"", "\"/\"", "\"\\\\/\"", "\"\\\\approx\"", "\"\\\\asymp\"", "\"\\\\bigcirc\"", "\"\\\\bullet\"", "\"\\\\cap\"", "\"\\\\cdot\"", "\"\\\\circ\"", "\"\\\\cong\"", "\"\\\\cup\"", "\"\\\\div\"", "\"\\\\doteq\"", "\"\\\\equiv\"", "\"\\\\geq\"", "\"\\\\gg\"", "\"\\\\in\"", "\"\\\\intersect\"", "\"\\\\union\"", "\"\\\\land\"", "\"\\\\leq\"", "\"\\\\ll\"", "\"\\\\lor\"", "\"\\\\o\"", "\"\\\\odot\"", "\"\\\\ominus\"", "\"\\\\oplus\"", "\"\\\\oslash\"", "\"\\\\otimes\"", "\"\\\\prec\"", "\"\\\\preceq\"", "\"\\\\propto\"", "\"\\\\sim\"", "\"\\\\simeq\"", "\"\\\\sqcap\"", "\"\\\\sqcup\"", "\"\\\\sqsubset\"", "\"\\\\sqsupset\"", "\"\\\\sqsubseteq\"", "\"\\\\sqsupseteq\"", "\"\\\\star\"", "\"\\\\subset\"", "\"\\\\subseteq\"", "\"\\\\succ\"", "\"\\\\succeq\"", "\"\\\\supset\"", "\"\\\\supseteq\"", "\"\\\\uplus\"", "\"\\\\wr\"", "\"\\\\\"", "\"~>\"", "\"=>\"", "\"=<\"", "\"=|\"", "\"=\"", "\"##\"", "\"#\"", "\"^^\"", "\"^\"", "\"--\"", "\"-|\"", "\"-+->\"", "\"-\"", "\"**\"", "\"*\"", "\"++\"", "\"+\"", "\"<=>\"", "\"<:\"", "\"<=\"", "\"<\"", "\">=\"", "\">\"", "\"...\"", "\"..\"", "\"||\"", "\"|\"", "\"|-\"", "\"|=\"", "\"&&\"", "\"&\"", "\"$$\"", "\"$\"", "\"??\"", "\"%%\"", "\"%\"", "\"@@\"", "\"!!\"", "\":>\"", "\":=\"", "\"::=\"", "\"(+)\"", "\"(-)\"", "\"(.)\"", "\"(/)\"", "\"(\\\\X)\"", "\"\\\\notin\"", "\"\\\\times\"", "\"\\\\X\"", "<IDENTIFIER>", "<ProofStepLexeme>", "<ProofImplicitStepLexeme>", "<ProofStepDotLexeme>", "<BareLevelLexeme>", "<UnnumberedStepLexeme>"]

for (position, token) in enumerate(all_tokens):
    if pattern in token:
        print("Position %d: token '%s'" % (position, token))

@konnov
Copy link
Collaborator

konnov commented Sep 27, 2024

@ahelwer shall we use the version of tla2tools 1.8.0? On GitHub it appears like new commits are being added to the release that was introduced 4 years ago.

@konnov
Copy link
Collaborator

konnov commented Sep 27, 2024

If you want a quick shortcut you could also just add all the different text-based unicode variants, defined here: https://github.com/tlaplus/tlaplus-standard/blob/main/unicode/tla-unicode.csv

Although this might make it more work if any of these mappings are ever changed.

This looks like the easiest path! Thanks!

@konnov
Copy link
Collaborator

konnov commented Sep 27, 2024

Ugh, the latest package published on oss.sonatype.org goes back to 2020: https://oss.sonatype.org/content/repositories/snapshots/org/lamport/tla2tools/

There is no package on maven. So it seems there is no automatic way for us to consume a newer version of SANY via sbt or mvn. @lemmy any ideas how to proceed?

@lemmy
Copy link
Contributor

lemmy commented Sep 27, 2024

I don't have time to resurrect publishing to maven central or Github's equivalent -- PR welcome. Can't you grab the tla2tools.jar from https://github.com/tlaplus/tlaplus/releases/tag/v1.8.0?

@ahelwer
Copy link
Author

ahelwer commented Sep 29, 2024

@konnov does Apalache pull in SANY using the maven package?

@konnov
Copy link
Collaborator

konnov commented Sep 29, 2024

@konnov does Apalache pull in SANY using the maven package?

Apalache uses sbt, which by default pulls a package from maven. We are also using sonatype as a secondary repo:

ThisBuild / resolvers ++= Resolver.sonatypeOssRepos("snapshots")

I am not sure whether Github provides their own maven-compatible repo.

@ahelwer
Copy link
Author

ahelwer commented Sep 29, 2024

Tricky thing is that I see maven has the ability to use sha1 as build version (see https://maven.apache.org/maven-ci-friendly.html) which we would need to use to distinguish the continually-updated 1.8.0 pre-release, but I can't find any details on whether various public maven repositories support that feature.

@lemmy
Copy link
Contributor

lemmy commented Sep 29, 2024

@konnov
Copy link
Collaborator

konnov commented Sep 30, 2024

Tricky thing is that I see maven has the ability to use sha1 as build version (see https://maven.apache.org/maven-ci-friendly.html) which we would need to use to distinguish the continually-updated 1.8.0 pre-release, but I can't find any details on whether various public maven repositories support that feature.

I think this practice of updating a release file without increasing the patch number does not work well with open source software engineering practices we see on Github, Linux, etc. You could add the build number or the current date to distinguish between the builds. If you keep overwriting releases, it does not work with maven (since it caches the packages), nix, you name it. Some projects produce nightly builds and then garbage collect them, e.g., Foundry.

@ahelwer
Copy link
Author

ahelwer commented Sep 30, 2024

Well, we're operating in the Java packaging ecosystem here which I'm unfamiliar with - personally I'm a big fan of content-addressed versioning like with nix. I agree there are drawbacks to the SNAPSHOT way of doing things but it does map well to how we are currently just uploading tla2tools.jar builds to the 1.8.0 release and overwriting the current ones. What do you think we should do, @konnov?

I do think this is a problem we should solve because I like the idea of other projects being able to pull in the tla2tools.jar as a dependency. Currently there's Apalache and also tlaplus-formatter by @FedericoPonzi.

@lemmy
Copy link
Contributor

lemmy commented Sep 30, 2024

I understand it's not cool, but the tla2tools.jar file is only 4MB in size (could be trimmed). Since it's a stable dependency for Apalache (with SANY being rarely updated), the simplest solution might be to commit the jar directly to the Apalache Git repository.

@konnov
Copy link
Collaborator

konnov commented Sep 30, 2024

Well, we're operating in the Java packaging ecosystem here which I'm unfamiliar with - personally I'm a big fan of content-addressed versioning like with nix. I agree there are drawbacks to the SNAPSHOT way of doing things but it does map well to how we are currently just uploading tla2tools.jar builds to the 1.8.0 release and overwriting the current ones. What do you think we should do, @konnov?

I do think this is a problem we should solve because I like the idea of other projects being able to pull in the tla2tools.jar as a dependency. Currently there's Apalache and also tlaplus-formatter by @FedericoPonzi.

I would also be happy, if there was a simple way to pull tla2tools.jar as a Java dependency. I have not been working with pure Java for a long time, but afaik maven dependencies are still something that all build systems in Java understand. In this case, it would be great, if the TLA+ project could publish these jars. Actually, you don't have to do it every night. For instance, we cut releases on request with GitHub actions, see prepare-release.yml. In this case, it would be something like publishing a maven package on demand (the good thing is that you can keep sensitive data in the GitHub secrets). My main issue with contributing there is that the tlaplus project is too big. The last time I tried, I could not even compile it in the command line.

Alternatively, we could do something like z3-turnkey.

lemmy added a commit to tlaplus/tlaplus that referenced this issue Sep 30, 2024
@ahelwer
Copy link
Author

ahelwer commented Sep 30, 2024

This would be even more work but in the future I think it would be nice to split out the SANY and PlusCal parsers into separate packages. They are self-contained with minimal dependencies, and projects consuming the tools as a dependency will probably be using them alone instead of pulling in the rest of the interpreter and model-checking code.

lemmy added a commit to tlaplus/tlaplus that referenced this issue Sep 30, 2024
lemmy added a commit to tlaplus/tlaplus that referenced this issue Sep 30, 2024
…ADME) instead of authentication-only Github maven packages.

Related to apalache-mc/apalache#2995

[Build]

Signed-off-by: Markus Alexander Kuppe <[email protected]>
@lemmy
Copy link
Contributor

lemmy commented Sep 30, 2024

Painful, but tla2tools.jar will from now on be published at https://oss.sonatype.org/content/repositories/snapshots/org/lamport/tla2tools/1.8.0-SNAPSHOT/

@ahelwer
Copy link
Author

ahelwer commented Sep 30, 2024

Thanks Markus! Wrangling the github CI is never a fun day.

@lemmy
Copy link
Contributor

lemmy commented Oct 1, 2024

I recommend using the new snapshot dependency from Sonatype until the version of SANY with Unicode support is fully integrated into Apalache. Once this integration is complete, we can manually tag the TLA+ Git repository and create a permanent release on Sonatype.

@konnov
Copy link
Collaborator

konnov commented Oct 1, 2024

Thanks a lot @lemmy!

@ahelwer
Copy link
Author

ahelwer commented Oct 1, 2024

tag the TLA+ Git repository and create a permanent release on Sonatype

Does this mean cutting a final release for 1.8.0 then doing a version bump for snapshot versions? If Unicode is going to be part of that release perhaps we should consider adding ℕ, ℤ, and ℝ to the standard modules. I will create an issue to consider the best way to do that.

@lemmy
Copy link
Contributor

lemmy commented Oct 1, 2024

No, I didn't mean to suggest a full-scale tla2tools.jar and Toolbox release. What I had in mind was more like a monthly or quarterly code snapshot (not -SNAPSHOT) that would be permanently stored in the Maven repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature or functionality
Projects
None yet
Development

No branches or pull requests

3 participants