Skip to content

Pattern Matching

Bill Hails edited this page Oct 19, 2024 · 21 revisions

Very much in the style of ML, you can write factorial as

fn factorial {
    (0) { 1 }
    (n) { n * factorial(n - 1) }
}

and map as

fn map {
    (f, []) { [] }
    (f, h @ t) { f(h) @ map(f, t) }
}

(see Lists for details of [] and @)

Points to Note

  • The function is defined with a name, but the alternative arguments and function bodies are in separate clauses within the composite function, like a case statement.
  • The alternative arguments must agree in number and type.
  • Formal arguments can be symbols, constants, tuples, or structures such as lists.
    • When the formal argument is a constant like 0, the actual argument must equal the formal argument.
    • When the formal argument is a symbol like f, it is bound to its actual argument in the normal way.
    • When the formal argument is a structure containing components, for example h @ t, the actual argument must match the structure of the formal argument and the same matching rules are applied recursively to each component.
    • When the formal argument is a structure without components, for example [], it behaves like a constant.
    • When the formal argument is a tuple, it behaves much like a structure.

This can be very powerful. Here's an implementation of quicksort from the tests:

let
    fn qsort {
        ([]) { [] }
        (pivot @ rest) {
            lesser = filter(fn (v) { v < pivot }, rest)
            greater = filter(fn (v) { v >= pivot }, rest)
            qsort(lesser) @@ [pivot] @@ qsort(greater)
        }
    }

    fn filter {
        (f, []) { [] }
        (f, h @ t) {
            if (f(h)) {
                h @ filter(f, t)
            } else {
                filter(f, t)
            }
        }
    }
in
    unsorted = ["hello", "goodbye", "here", "there", "everywhere"];    
    qsort(unsorted) // ["everywhere", "goodbye", "hello", "here", "there"]

The original functional algorithm for quicksort was pilfered from here. I believe that is written in swift, and it's nice that our implementation is more succinct, readable and importantly more generic than that, though it's not very efficient because of all those append operations, look in listutils.fn for a much more efficient but less concise implementation.

Wildcard

Function arguments also admit the "wildcard" symbol _. This behaves as a constant which will match anything and, because it is a constant it doesn't get bound in the body of the function and so can be used multiple times in an argument list to signify values we aren't interested in, without risking a conflict.

For example here's an implementation of len over a list:

fn len {
    ([]) { 0 }
    (_ @ t) { 1 + len(t) }
}

This len ignores the value in the head of the list.

Named Structures

The structure in a formal argument to a composite function can be given an alias with =. This is only necessary if you want to both access the components and treat the structure as a whole (to avoid copying it.) For example:

fn foo {
    ([]) { [] }
    (x = h @ t) {
        // do stuff with h, t and x
    }
}

In the second function clause, h and t are bound to the head and tail of the list as usual, but additionally x is bound to the whole argument h @ t.

Pseudo-unification

Given the pattern matching done by composite functions, we can get a little extra functionality by allowing duplicate variables in the formal arguments. If a duplicate variable is seen, then the value must be the same for each equivalent actual argument or the match will fail. This becomes powerful in combination with the decomposition of structures, for example here's an implementation of member, making use of this feature.

fn member { // is x a member of a list
    (_, [])    { false }
    (x, x @ _) { true }
    (x, _ @ t) { member(x, t) }
}

In the second clause, if the value being searched for, x, matches the value at the head of the list, then the result is true.

Shared Bodies

In some cases different matches may require exactly the same function body. In this circumstance you can supply the matches separated by | characters and a single function body. Variables mentioned in the body must be bound by each of the matches individually. This is implemented purely within the parser, which simply duplicates the shared body and attaches it to each of the matches.

Example:

fn {
  (0, x) | (1, x) | (2, x) { ... }
  (_, x)                   { ... }
}

Note that the overall order of the matches is still important and is preserved.

Switch Statements

A trivial addition to the language, the construct

switch(x, y) {
  (10, 11) { foo() }
  (11, 12) { bar() }
  (_,  _)  { baz() }
}

is rewritten internally to

fn {
  (10, 11) { foo() }
  (11, 12) { bar() }
  (_,  _)  { baz() }
}(x, y);

by the parser before evaluation. Note that this allows the full expressiveness of composite functions in switch statements.

Unsafe Functions

It's quite possible to create a function that could fail to match its arguments, resulting in a run-time error, and in fact sometimes that is desirable. For example a function like head that returns the first element on a list is necessarily undefined for the empty list. However you'll still want to know if you've accidentally created such a function. For that reason such unsafe functions and switch statements which have a non-exhaustive pattern match must declare themselves as such using the keyword unsafe. Examples:

unsafe fn head (h @ _) { h }

and

unsafe switch(foo()) { (h @ _) { h } }

It is an error to declare a safe function unsafe.

Implementation

Under the hood the pattern matching is done by converting the matrix of all arguments to a nest of case statements so it is fairly efficient. The reference paper I used was Pettersson 92.

Next: Typedef