-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenizer gets no-meaning infix ops from JSON #87
Conversation
Looks like UndirectedEdge (which is not a no-meaning operator) needs to be handled. Possibly DirectedEdge too. |
("Element", r" \u2208 "), | ||
("NotElement", r" \u2209 "), | ||
("Subset", r" \u2282 "), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these entries gone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It now gets pulled in from JSON.
mathics_scanner/tokeniser.py
Outdated
# named-characters.yml | ||
|
||
operators_table_path = osp.join(ROOT_DIR, "data", "operators.json") | ||
assert osp.exists( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This produces an error when the operator.json
table is built for the first time.
mathics_scanner.generate.build_operator_tables
imports mathics_scanner.__version__
,
with makes that mathics_scanner.__init__
be loaded. But then, it tries to import this module, and finds that the file is not already created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A way to avoid this error would be to put all the initialization code inside an initialization function, and instead of raising an exception if the file does not exist, just show a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the situation before 370b8fe and this does not happen now on Ubuntu and Macos. But I don't know why Windows is still failing here.
I am exhausted from tracking down all the little inconsistencies for today. If you can move this forward, please do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A way to avoid this error would be to put all the initialization code inside an initialization function, and instead of raising an exception if the file does not exist, just show a warning.
Moving code to a function was done and delaying initialization was attempted but the code is too thorny that something in there is doing stuff earlier than when a tokenizer is first created.
I now have a workaround, so let's not add yet another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
If you have work cycles to spare, here are some things that in my opinion are more important than yet another workaround:
- Put in the correct precedence for no-meaning operators
- Split out the following list of operator names by creating sections for left assoc infix, right assoc infix, flat infix, (not yet done prefix/postfix), "misc" and newly added {Und,D]irectedEdge operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have work cycles to spare, here are some things that in my opinion are more important than yet another workaround:
* Put in the correct precedence for no-meaning operators
Regarding this, "correct" would be in relation with the WMA Precedence[...]
, isn´t it?
* Split out the following list of operator names by creating sections for left assoc infix, right assoc infix, flat infix, (not yet done prefix/postfix), "misc" and newly added {Und,D]irectedEdge operators.
Do you mean as submodules of no_meaning
?
Function unicode change so as not to conflict with RightTeeArrow. Function unicode is a long arrow. Remove CSV to YML stuff. CSV is beyond hope of keeping in sync. Remove tokeniser import from __init__.py. Workaround for now. We need this so we can create operator JSON without needing the JSON table to be prevously around. clanup tokeniser.py a little bit. SPlit typing changing variables like tokens and literal_tokens into separate variables for each type they can hold.
mathics_scanner/tokeniser.py
Outdated
|
||
|
||
# Initalized below in update_tokens_from_JSON | ||
tokens = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these variables a part of the module interface? Or are they just internal variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokens and literal tokens start out as tuples of operator name and string value and are used to create position indexes which are used in parsing.
but after this, they are converted to some sort of compiled expression, and then I think these type-changed variables are used in parsing. (It may have been in some style of programming where variables morph after their old value is no longer needed, was possibly at one time considered clever. Nowadays, it is considered annoying and frowned upon. In fact. in strongly-typed languages you are not allowed to do this.)
I have split out those two uses. We could run del tokens
when they are no longer needed, but, right now, I don't think it is worth the bother.
3b0e5aa
to
ec5e5fa
Compare
ec5e5fa
to
7be3875
Compare
* working out the initialization of the tokenizer
@rocky, sorry. Something went wrong. It just passed all the tests... Let's revert |
* working out the initialization of the tokenizer
With this I think all infix operators with no meaning will start to work in mathics-core.