DEV Community

Cover image for World-first Static time RegEx engine with O(0) time complexity
Jakub Švehla
Jakub Švehla

Posted on • Edited on

World-first Static time RegEx engine with O(0) time complexity

What the heck is that?

  • RegEx engine written with static types?!
  • Code which evaluates RegEx "templates" in compile time so you know the result before you run your app?!
  • RegEx engine which works with O(0) run-time complexity?!
  • Minified 0-bite (GZip) length output?!
  • Fully bugged and not ready for production?!

I'm not kidding!!! This is not just a dream!

This is the first world RegEx engine written in pure Typescript types.

Check the working examples!

preview 1

preview 2

preview 3

preview 4

Github Repo - ts-generics-RegEx-engine

you can play with the source code here

Disclaimer

  • The code is not ready to use in production environment.
  • Because of the stack limits of Typescript, some regExs stop working because they are too long and trigger recursion stack overflow known as Type instantiation is excessively deep and possibly infinite.
  • RegEx backtracking is not implemented yet.
  • The parser supports only a small subset of PCRE standard. Specifically .?*+()\\ symbols.

Motivation + usage

Thanks to new features of Typescript 4.1.x we are able to parse a string into a Tuple of tokens and much more! So I decided to write my own custom RegEx engine just by using Typescript static types to demonstrate how powerful the type system of Typescripts is.

How does the RegEx engine work under the hood?

As you may know, programming languages compilers + interpreters. You may know that they are pretty complex and includes Lexers, Parsers, Interpreters, and so on.

On the other side, this small engine is quite simple, so there are just 3 small modules:

  • 1. Tokenizer
  • 2. Parser
  • 3. Interpreter

1. Tokenizer

A small generic type TokenizeString<T> just parses RegEx template to tokens which are used as the input for 2. Parser to build RegEx Abstract-Syntax-Tree (AST).

Examples:

type T0 = TokenizeString<'\\(+(ab)+'>
Enter fullscreen mode Exit fullscreen mode

Tokenize 1 preview

type T1 = TokenizeString<'\\(+(a(xy)+(xx)b)+'>
Enter fullscreen mode Exit fullscreen mode

Tokenize 2 preview

2. Parser

type ParseRegExTokens<T> = ... takes the tokenized template and does the syntax analysis which produces an Abstract-Syntax-Tree (AST) Model of the RegEx template.

Examples:

type T3 = ParsedRegEx<TokenizeString<'\\(+(a(xy)+(xx)b)+'>>
Enter fullscreen mode Exit fullscreen mode

Parser preview

As you can see, the parser supports nesting of structures (like brackets in brackets in brackets etc...)

AST for '\\(+(a(xy)+(xx)b)+' template will look like this:

[{
    type: 'element';
    value: "(";
    quantifier: 'exactlyOne';
}, {
    type: 'element';
    value: "(";
    quantifier: "zeroOrMore";
}, {
    type: 'groupElement';
    states: [{
        type: 'element';
        value: "a";
        quantifier: 'exactlyOne';
    }, {
        type: 'groupElement';
        states: [{
            type: 'element';
            value: "x";
            quantifier: 'exactlyOne';
        }, {
            type: 'element';
            value: "y";
            quantifier: 'exactlyOne';
        }];
        quantifier: 'exactlyOne';
    }, {
        ...; // and so on
    }, {
        ...; // and so on
    }, {
        ...; // and so on
    }];
    quantifier: 'exactlyOne';
}]
Enter fullscreen mode Exit fullscreen mode

3. RegEx Interpreter

The last step is to create a proper "interpreter" type Test<RegExp, TestString> = ... which takes a template and a test string by applying rules from the RegEx AST.

Examples:

Alt Text

Alt Text

Alt Text

Alt Text

Alt Text

Alt Text

Alt Text

Alt Text

Alt Text

And that's it! 🎉 🎉

If you don't believe, you can check the full source code in this GitHub repo: https://raw.githubusercontent.com/Svehla/ts-generics-RegEx-engine

Wait... And what about the real Javascript output? Let's check it out!

Zero runtime preview

Haha! A few hundreds line of static types and runtime output is empty with O(0) time complexity! That's the magic of Typescript 🦄

And what's next?

If you're interested in another advanced usage of the Typescript type system, you can check these step-by-step articles/tutorials on how to create some advanced Typescript generics.

Top comments (8)

Collapse
 
oguimbal profile image
Olivier Guimbal • Edited

Haha, Typescript type system has been Turing-Complete for a while now, leading to all kind of weird experiments, but template literal types possibilities are quite mindblowing and fun 😁

I plugged it in ts-sql just for fun, it looks like it kind of works, but the .+ regex seems to be too much !

Collapse
 
svehla profile image
Jakub Švehla • Edited

Haha! That's soooooo Good!!!

:D It really made my day! Thanks for sharing!

Yeee, I tried so hard and get so far but in the end, I'm still not able to do some effective bypass of this recursive stack Error. :/ I'll figure it out someday!

Collapse
 
michelemauro profile image
michelemauro

Really interesting take. It makes perfect sense to compile a static regex at compile time, and to use the compiler's own type system to do it. Of course, the type checker doesn't expect to be abused that way 😁

Collapse
 
faiwer profile image
Stepan Zubashev

It's crazy. In a good meaning of the word "crazy". Good job, guy! :-) Abnormal programming.

Collapse
 
svehla profile image
Jakub Švehla

Haha! Thank you man 😇

Appreciate it!

Collapse
 
ionellupu profile image
Ionel Cristian Lupu

This is really really interesting. Love the work. Nice.

Nice to know what Typescript can do even though you can't use it in production

Collapse
 
raphaelmansuy profile image
Raphael MANSUY

🚀 Wonderful 💪 Thanks

Collapse
 
avantiny profile image
Dominik Kuře

I had a really good time reading this. Keep it up!