-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow semantics to be defined in Python #52
Comments
@fosler @jashelio (this is on a GitHub issue; and replying to it will add your response to the issue) Question 1: Should plccp be a separate new system based on the existing plccj? Or should plccp and plccj share components (e.g., scanner)?Part of me likes the idea of shared reusable components, but that would require a redesign of the existing system. For example, if we want a reusable scanner, it would probably become a separate command-line application that takes a stream of characters on standard input and produces stream of tokens on standard out. If we want to be reuse purists, we could even make a separate parser tool that takes that stream of tokens and produces a parse tree on standard out in a format like JSON or YAML (basically a language-agnostic internal representation of the parse tree). Last, each "semantics analyzer" (Java and Python) reads this parse tree and constructs the corresponding object structure in then kicks of the semantic analysis. Although this appeals to the software designer in me, it would be a lot of work. Building a separate system (plccp) based on the current (plcc/plccj) is probably the fastest way to get what we want. Basically, find all the Java and replace it with equivalent Python. I'm sure that's an oversimplification and there will be devils in the details. But I think this would lead to two systems that are reasonably maintainable because they are isolated from each other. So I think I'm leaning to separate systems, and I think that's what @fosler was describing. What do others think? Question 2: assuming "separate" is the answer to the previous, then should we create a new repository for plccp?I'm thinking yes. This would allow separate version numbers for the two; allowing development to move at separate paces. It is possible to many two projects with different version numbers in the same repository. But it's more complicated, and not something I've personally done. |
To Do
|
The goal is to let students write semantics in Python rather than in Java. The code fragments in the semantics section do not interact with the scanner or parser directly. They only interact with parse tree nodes and their contents. So theoretically, we might be able to reuse PLCC's scanner and parser, and the only parts we MUST write those parts that generate classes based on the BNF rules and the semantics section. So here is my crazy idea... When After running The advantages of this approach are:
The disadvantages of this approach are:
|
DECIDED: We should augment plcc rather than build a separate system. |
Consider a reduced scope implementation of Python hierarchy. For example, only handle ::= with unique LHS and only tokens on the RHS. |
In the new hybrid Java-Python mode, with a 4-section grammar... I'm not clear on how and when code in the 3rd section is executed. In a 4-section grammar, the 3rd section contains Java code that express syntactic rules that cannot be expressed, or cannot elegantly be expressed, in PLCC's BNF. But how and when do these fragments get executed? In a 3-section grammar, one overrides But in a 4-section grammar, the parser does NOT have a hook for running additional syntactic checks. Do we need to add a new one (e.g., Thoughts? |
I have struggled to wrap my mind around what the “Spring 2024 project” version does. But my understanding is that (1) PLCC still generates Java classes, and (2) when the resulting Java code (containing scanner and parser) is run against code in the defined language, Java objects representing the AST are still created. But then, (3) you instruct that tree to build a replica of itself in JSON. (4) Python code then uses the JSON spec to rebuild the ADT in the Python environment, which was partially generated by PLCC and partially by the semantic code in the fourth section of the grammar file. Finally (5) the Python code is run and the user’s program written in the defined language “executes”, whatever that means.So if you are writing “syntax checks” in Java, that tells me that these checks are in the constructors of the original Java code classes built in step (1). Here is an example from our modified V6 language in ourPLCC/languages. (I had to screenshot it because my iPad wouldn’t let me copy and paste text from the GitHub window.)I personally am totally happy with this approach. Therefore, I am willing to leave things alone for now until we have a full Python-based pipeline, where the problem would solve itself.However, if folks don’t like the “:init” way of doing these checks, then my second choice would be the $check() solution.Jim
[edit: removed copy of previous messages]
|
@jashelio Your screenshot didn't come through. You should be able to drag and drop screenshots into a github comment. Although this might not be easy on ipod. Your suggestion of using constructors sounds familiar now. Thanks for reminding me. So we could write helpers as normal and then hook them in using |
Another issue for our meeting on Friday... How will "includes" work with 4 sections? If someone puts an include in the 3rd section (Java), that's different than if they put it in the 4th section (Python). I/we, need to think about this. |
In our meeting, we talked about include. Here is a quick summary:
|
Moving @jashelio comment from #96 to #52. Perhaps I can be accused of being to much of an abstractionist, but here goes. I feel that the following idea
is short-sighted. I personally would prefer that we think in the following language-independent way.
And that we actually have options on the command line to say what languages are being used. Eventually we could change the section markers in the grammar file to say % Java to get rid of those pesky command line options. Just a thought,
|
PRs are intentionally short-sighted. It helps ensure their success by avoiding scope creep and the moving target problem. I want them to be small and focused. Hopefully each PR is a step in the right direction. Whether or not they are in the right direction, we learn from them, update our design and plans here, and then we try to take another small step.
|
From @fosler
"Re-write plcc.py and the files in the Std directory so that it builds an interpreter with Python3 as the implementation language. We could then call it plccp, with the Java version called plccj."
High-level Design
We extend PLCC to allow an optional 4th section is grammar files. When 4 sections are present there purposes are as follows:
When four sections are detected, PLCC operates in a new hybrid Java-Python mode. In this mode, PLCC generates the following Java code:
PLCC also generates the following Python code:
Architecture Diagram
Pro-Con of Architecture
Pros
scan
andparse
are shared between the Java and Python (because they are really Java)Cons
scan
andparse
become more challenging. Mitigations:scan < source | parse | rep
and plcc.py itself. These can be slowly introduced over time, as/if needed.Working Plan
The current idea is to evolve the existing plcc system to support Python semantics.
1. Generate JSON AST [DONE; thank you @Rarity-Belle and @WilliamBowery!!]
The idea is to have Parse.java accept a new option that will cause it to print an AST in a JSON format. This is a new feature that will not break existing code. Why JSON? Python comes with a library that can parse JSON into a native data structure (nested dicts and lists and primitive types).
2. Add new semantics section to grammar files
Add an optional 4th section to grammar files. The 4th section will be populated by Python semantics. When a grammar files is written with four section, its structure is as follows:
Existing 3-section grammars should function normally.
When a 4th section is detected (even when empty), it will generate the Java code needed to generate JSON ASTs. Note that the 4rd section could also be empty, and it would behave the same.
3. Generate a parallel Python class hierarchy
plcc.py currently generates a Java class hierarchy of parse node types based on the BNF rules in the syntactic specification of a given grammar file. It also generates a recursive decent parsing algorithm that is implemented across this hierarchy in static methods. plcc.py also injects the semantics code from the semantics specification of a grammar file into these classes. When the parser is ran on a program, the parser instantiates objects from these classes and connects them to form a parse tree.
Now, when the 4th section is present, we need plcc.py to also generate a parallel class hierarchy in Python. It does not need code to parse the original program. But it does need to support construction of a parse tree of object given an AST in JSON.
4. Load JSON AST in Python
Write a new python function
load_json_ast(json)
that takes a JSON AST as a string and returns an object tree such that the nodes are appropriate instances of the parallel class hierarchy and correctly connected. This function is generated when there is a 4th section present.5. Inject semantics into Python
In this step, we inject semantics from the 4th section of the grammar file into the Python class hierarchy.
6. Write a Python REP loop
In this last step, we write a Python version of the REP loop that uses everything that came before to run a program in a language that was defined using Python semantics.
(based on an email between @StoneyJackson @jashelio @fosler)
When we are done with the Python stuff, the Python stuff should contain a REP loop with the following loop body:
EDIT: [Jan 22, 2024] Reorder the Working Plan, eliminating the need for a new
--python
option.EDIT: [Feb 15, 2024] Add Rep loop algorithm.
The text was updated successfully, but these errors were encountered: