Skip to content

Bagged few features support RegEx engine written just by Typescript generics

Notifications You must be signed in to change notification settings

MelkorNemesis/ts-generics-RegEx-engine

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What the heck is that?

  • RegEx engine written with static types?!
  • Code which evaluates RegEx "templates" in compile time so you know the result before you run your app?!
  • RegEx engine which works with O(0) run-time complexity?!
  • Minified 0-bite (GZip) length output?!
  • Fully bugged and not ready for production?!

I'm not kidding!!! This is not just a dream!

This is the first world RegEx engine written in pure Typescript types.

Check the working examples!

Alt Text

Alt Text

Alt Text

Alt Text

Github Repo - ts-generics-RegEx-engine

you can play with the source code here

Disclaimer

  • The code is not ready to use in production environment.
  • Because of the stack limits of Typescript, some regExs stop working because they are too long and trigger recursion stack overflow known as Type instantiation is excessively deep and possibly infinite.
  • RegEx backtracking is not implemented yet.
  • The parser supports only a small subset of PCRE standard. Specifically .?*+()\\ symbols.

Motivation + usage

Thanks to new features of Typescript 4.1.x we are able to parse a string into a Tuple of tokens and much more! So I decided to write my own custom RegEx engine just by using Typescript static types to demonstrate how powerful the type system of Typescripts is.

How does the RegEx engine work under the hood?

As you may know, programming languages compilers + interpreters. You may know that they are pretty complex and includes Lexers, Parsers, Interpreters, and so on.

On the other side, this small engine is quite simple, so there are just 3 small modules:

    1. Tokenizer
    1. Parser
    1. Interpreter

1. Tokenizer

A small generic type TokenizeString<T> just parses RegEx template to tokens which are used as the input for 2. Parser to build RegEx Abstract-Syntax-Tree (AST).

Examples:

type T0 = TokenizeString<'\\(+(ab)+'>

Alt Text

type T1 = TokenizeString<'\\(+(a(xy)+(xx)b)+'>

Alt Text

2. Parser

type ParseRegExTokens<T> = ... takes the tokenized template and does the syntax analysis which produces an Abstract-Syntax-Tree (AST) Model of the RegEx template.

Examples:

type T3 = ParsedRegEx<TokenizeString<'\\(+(a(xy)+(xx)b)+'>>

Alt Text

As you can see, the parser supports nesting of structures (like brackets in brackets in brackets etc...)

AST for '\\(+(a(xy)+(xx)b)+' template will look like this:

[{
    type: 'element';
    value: "(";
    quantifier: 'exactlyOne';
}, {
    type: 'element';
    value: "(";
    quantifier: "zeroOrMore";
}, {
    type: 'groupElement';
    states: [{
        type: 'element';
        value: "a";
        quantifier: 'exactlyOne';
    }, {
        type: 'groupElement';
        states: [{
            type: 'element';
            value: "x";
            quantifier: 'exactlyOne';
        }, {
            type: 'element';
            value: "y";
            quantifier: 'exactlyOne';
        }];
        quantifier: 'exactlyOne';
    }, {
        ...; // and so on
    }, {
        ...; // and so on
    }, {
        ...; // and so on
    }];
    quantifier: 'exactlyOne';
}]

3. RegEx Interpreter

The last step is to create a proper "interpreter" type Test<RegExp, TestString> = ... which takes a template and a test string by applying rules from the RegEx AST.

Examples:

Alt Text

Alt Text

Alt Text

Alt Text

Alt Text

Alt Text

Alt Text

Alt Text

Alt Text

And that's it! 🎉 🎉

If you don't believe, you can check the full source code in this GitHub repo: https://github.com/Svehla/ts-generics-RegEx-engine

Wait... And what about the real Javascript output? Let's check it out!

Alt Text

Haha! A few hundreds line of static types and runtime output is empty with O(0) time complexity! That's the magic of Typescript 🦄

And what's next?

If you're interested in another advanced usage of the Typescript type system, you can check these step-by-step articles/tutorials on how to create some advanced Typescript generics.

About

Bagged few features support RegEx engine written just by Typescript generics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 99.5%
  • JavaScript 0.5%