-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzzy Parsing Example #993
Comments
The documentation for Token Categories is rather sparse. What's a practical application? |
Basically you can specify that multiple tokens are of the category X.
Then token categories allow you to define multiple "inheritance" between tokens. Example:
So basically this can be expended to do a "MatchALL" Token which could be used as part of a fuzzy parsing solution. |
A good example of fuzzy parsing would be able to return a list of commands and whatever text is between commands for further parsing. machine:chevrotain user$ ls
CONTRIBUTING.md greenkeeper.json readme.md
LICENSE.txt lerna.json tslint.json
NOTICE.txt package.json yarn.lock
examples packages
machine:chevrotain user$ cat NOTICE.txt
Copyright (c) 2015-2019 SAP SE or an SAP affiliate company.
machine:chevrotain user$ |
I am not sure this can even be represented as a context free grammar. |
This task is accomplished using regexp scanners. IMHO a good use of Chevrotain is the construction and management of domain specific tokenizers at run time. Scanning the resulting tokens for re-tokenizing and possibly complete parsing. My goal is to learn tokens at run time, and restart the process. Fuzzy parsing is a means of implementing a learning parsing system. |
Interesting, you could effectively use some heuristics to identify the delimiter In your case the "grammar" itself seems trivial so I am not sure there would be any need for a Chevrotain Parser part. |
BTW you can dynamically create Chevrotain Parsers as well as Lexers using the custom APIs feature Granted this has limitations: And it also requires the use of EVAL (not supported when there is a context security policy, e.g many websites). But could still be interesting... |
I've started playing around with a similar scenario which required consuming any kind of tokens between a "--" and a semiColon. Basically the CSS 3 custom property syntax: You can inspect the current state of the example here: |
Consider expanding the fuzzy parsing example to a scenario in which the fuzzy matching acts as a default fall back. e.g: 3 Alternatives, and the 3rd one being the fuzzy one which could conflict with the first 2. |
See:
This could probably be done using Token Categories to match against "all" kinds of tokens combined with
an alternation which includes the one option we care about.
e.g:
The text was updated successfully, but these errors were encountered: