Skip to content

hlysine/readable-regexp

Repository files navigation

readable-regexp

CI Coverage Status TypeScript npm npm bundle size Dependency Count

Regular Expressions - quick and concise, readable and composable.

Quick example of readable-regexp

Features โ€ข Installation โ€ข Quick Start / Documentation

Features

๐Ÿ“– Readable

Be explicit and extract common pieces

Click to see examples

Compare a readable-regexp expression:

const num = capture.oneOf(
  oneOrMore.digit, // integer
  zeroOrMore.digit.exactly`.`.oneOrMore.digit // decimal
);
const regExp = match(num).exactly`,`.maybe` `.match(num).toRegExp(Flag.Global); // num is used twice here

With normal JS RegExp:

const regExp = /(\d+|\d*\.\d+), ?(\d+|\d*\.\d+)/g; // we have to copy-paste the capture group

In a more complex use case, we can destructure the expression into manageable small parts:

const allowedChar = notCharIn`<>()[]\\\\` `.,;:@"` (whitespace);

const username =
  oneOrMore.match(allowedChar)
  .zeroOrMore(
    exactly`.`
    .oneOrMore.match(allowedChar)
  );

const quotedString =
  exactly`"`
  .oneOrMore.char
  .exactly`"`;

const ipv4Address =
  exactly`[`
  .repeat(1, 3).digit
  .exactly`.`
  .repeat(1, 3).digit
  .exactly`.`
  .repeat(1, 3).digit
  .exactly`.`
  .repeat(1, 3).digit
  .exactly`]`;

const domainName =
  oneOrMore(
    oneOrMore.charIn`a-z` `A-Z` `0-9` `-`
    .exactly`.`
  )
  .atLeast(2).charIn`a-z` `A-Z`;

const email =
  lineStart
  .capture.oneOf(username, quotedString)
  .exactly`@`
  .capture.oneOf(ipv4Address, domainName)
  .lineEnd
  .toRegExp();

This is far more readable and debuggable than the equivalent RegExp:

const email =
  /^([^<>()[\]\\.,;:@"\s]+(?:\.[^<>()[\]\\.,;:@"\s]+)*|".+")@(\[\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\]|(?:[a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,})$/;

๐Ÿ“ Flexible and Concise

Multiple shorthands and syntax options

Click to see examples

Without all the shorthands, an expression looks like this:

const regExp = exactly('[')
  .captureAs('timestamp')(oneOrMore(not(charIn(']'))))
  .exactly('] ')
  .captureAs('category')(oneOrMore(word).exactly('-').oneOrMore(word))
  .exactly(': ')
  .captureAs('message')(oneOrMore(char))
  .toRegExp('gm');

Whenever a function takes a single string literal, you can use a tagged template literal to remove the brackets:

const regExp = exactly`[`
  .captureAs`timestamp`(oneOrMore(not(charIn`]`)))
  .exactly`] `
  .captureAs`category`(oneOrMore(word).exactly`-`.oneOrMore(word))
  .exactly`: `
  .captureAs`message`(oneOrMore(char))
  .toRegExp`gm`;

When there is only one token in a quantifier or group, you can chain it with . instead of using a bracket:

const regExp = exactly`[`
  .captureAs`timestamp`.oneOrMore.not.charIn`]`
  .exactly`] `
  .captureAs`category`(oneOrMore.word.exactly`-`.oneOrMore.word)
  .exactly`: `
  .captureAs`message`.oneOrMore.char
  .toRegExp`gm`;

There are shorthands for negating a character class or a lookaround:

const regExp = exactly`[`
  .captureAs`timestamp`.oneOrMore.notCharIn`]`
  .exactly`] `
  .captureAs`category`(oneOrMore.word.exactly`-`.oneOrMore.word)
  .exactly`: `
  .captureAs`message`.oneOrMore.char
  .toRegExp`gm`;

As you can see, most of the distracting brackets are gone, and you are left with a clean and concise expression.


๐Ÿ›Ÿ Safe

Type check, auto-complete, and runtime safeguards

Click to see examples

Some errors can be avoided just by writing in readable-regexp:

const o = 'ศฎ'; // 0x022e
const result1 = /\u22e/.test(n);
// false

const result2 = unicode`22e`.toRegExp().test(n);
// true
// '22e' is automatically fixed to be '\u022e'

Some errors can be caught by TypeScript at compile time:

(Not working at the moment. These errors will either be thrown at runtime or be handled by readable-regexp to produce reasonable RegExp.)

// @ts-expect-error - You cannot use two quantifiers on one token
const regExp = oneOrMore.zeroOrMore`foo`;
// @ts-expect-error - char is not negatable, because it matches nothing
const regExp = oneOrMore.not.char;
// @ts-expect-error - k is not a valid flag
const regExp = char.toRegExp('gki');

Some can be caught at run time:

const result1 = /(foo)\2/.test('foofoo');
// false

const result2 = capture`foo`.ref(2).toRegExp().test('foofoo');
// Error: The following backreferences are not defined: 2

Installation

With a package manager

npm install readable-regexp

yarn add readable-regexp
import { oneOrMore, exactly } from 'readable-regexp';

const { oneOrMore, exactly } = require('readable-regexp');

With a CDN

<script src="https://cdn.jsdelivr.net/npm/readable-regexp/dist/readable-regexp.umd.js"></script>
const { oneOrMore, exactly } = readableRegExp;

Quick Start / Documentation

Quick Start
Documentation
TypeDoc