Matching Patterns with Scheme

A while back, I wrote a post about macros in Scheme. Today I want to take a look at how one might begin to implement a macro system. In Scheme, whether you use syntax-rules or syntax-case do write your macros, at some point you'll write patterns and templates. Macros match their input against a pattern and then use this to instantiate a template. Let's consider a two-way or macro:

(define-syntax or2
  (syntax-rules ()
    ((_ e1 e2)
     (let ((t e1))
       (if t t e2)))))

We can see how this works at the REPL by using expand:

> (expand '(or2 1 2))
(let ([t.8 1]) (if t.8 t.8 2))

This matches the expression (or2 1 2) against the pattern (_ e1 e2). The underscore means we don't care what's in that position (it's always or2, since this is part of the or2 macro definition). The names e1 and e2 allow us to refer to what occurs in this pattern in the template. When instantiating the template, the macro system replaces all references to e1 with the chunk of syntax from that position (1 in this case), and likewise with e2. The result is the expression, (let ([t.8 1]) (if t.8 t.8 2)). Here we let-bind the first expression because we do not want to accidentally evaluate any side effects it might contain twice. Don't worry too much about why the variable name is t.8 instead of just t. This has to do with something called hygiene, which I will hopefully say more on in a later post.

Now let's take a look at how we would write a function to match patterns like this. We'll start with a procedure called match that takes four arguments: the pattern to match against, the expression to match, a success continuation, and a failure continuation. The success continuation is called if the expression matches the pattern, and the failure continuation is called otherwise. When I first started writing this, it seemed like it would be cleaner to use success and failure continuations, although I'm not entirely sure that's true anymore. At the moment, we will support three things for patterns:

Pairs - A pair pattern matches an expression if and only if the expression is a pair, and the car of the pattern matches the car of the expression and the cdr of the pattern matches the cdr of the expression.
Symbols - A symbol pattern binds a pattern variable. This matches any expression, and binds that expression to the variable.
Null - A null pattern matches an expression if and only if the expression is null.

Scheme's pattern matching supports several other features, like auxiliary keywords, which we're ignoring for the now for the sake of simplicity. Using our three kinds of patterns so far, we can translate this pretty directly into a Scheme program:

(define (match p e sk fk)
  (cond
    ((and (pair? p) (pair? e))
     (match (car p) (car e)
            (lambda (b)
              (match (cdr p) (cdr e)
                     (lambda (b^) (sk (append b b^)))
                     fk))
            fk))
    ((and (symbol? p))
     (sk (list (cons p e))))
    ((and (null? p) (null? e))
     (sk '()))
    (else (fk))))

Notice that we pass a list of bindings to the success continuation. These will be used by the template instantiation code. Let's try an example:

> (match '(_ e1 e2) '(or2 1 2) (lambda (b) b) (lambda () #f))
((_ . or2) (e1 . 1) (e2 . 2))

For the most part, this does what we want. The matcher says the expression matches the pattern, and tells us that e1 was bound to 1 and e2 was bound to 2. However, we also bound _ to or2. In reality, we want _ to mean ignore, which we can do by adding one line before the symbol? line in our cond clause:

((eq? p '_)
 (sk '()))

Now we get only the bindings we care about.

The next part is to use the resulting bindings to instantiate the template. We'll do this with a function called instantiate, which takes a template (called p), and a list of bindings. This is a straightforward tree walk that tries to replace symbols with expressions when they are bound:

(define (instantiate p bindings)
  (cond
    ((pair? p)
     (cons (instantiate (car p) bindings)
           (instantiate (cdr p) bindings)))
    ((assq p bindings) => cdr)
    (else p)))

If we did this right, we should be able to instantiate the template from the or2 macro above.

> (instantiate
      '(let ((t e1))
         (if t t e2))
      '((e1 . 1) (e2 . 2)))

(let ([t 1]) (if t t 2))

It looks like we got what we wanted, although without hygiene.

At this point we have a pattern matcher and template instantiation function that is suitable for making a very simple macro system. One very nice feature that we are missing at the moment is ellipsis patterns, which allow us to match repeated patterns. For example, we could use this to write a custom let macro like this:

(define-syntax my-let
  (syntax-rules ()
    ((_ ((x e) ...) b)
     ((lambda (x ...) b) e ...))))

In a post at some point in the hopefully near future, I will show how to handle patterns with ... in them.