General purpose tokenizer for parsing of any syntax.
Lexer
Lexical analysis is the first phase that a parser or compiler needs to do in order to break down code into its smallest parts.
First, an enum is needed that represents the type of a token:
public enum FormulaTokenType
{
OpenParenthesis,
CloseParenthesis,
Operator,
Integer
}
Next, create a Lexer<YourTokenType>
with regular expressions for each grammar rule:
Lexer<FormulaTokenType> lexer = Lexer
.Create<FormulaTokenType>()
.Ignore(@"[ \t\r\n]+")
.Match(FormulaTokenType.OpenParenthesis, @"\(")
.Match(FormulaTokenType.CloseParenthesis, @"\)")
.Match(FormulaTokenType.Integer, @"[\+-]?[0-9]+")
.Match(FormulaTokenType.Operator, @"\+|\-|\*|\/");
Finally, to tokenize a string, call the Parse
method. The result is a collection of tokens for further processing.
string formula = "(3 + 4) * 15 / ((-10 - 5) * 3)";
TokenCollection<FormulaTokenType> tokens = lexer.Parse(formula);
- new:
Lexer
method overloads withRegexOptions
parameter - change: Replaced
Func<string, string>?
parameters withFunc<Match, string>?
- Initial release