Patterns with Ellipses

Last time, we talked about matching patterns in Scheme. Now we will look at how to extend the pattern matcher and template instantiation code to handle patterns with ellipses.

Do start with, let's look at an example macro. Imagine we wanted to take the or2 macro from last time, but we want to extend it to support any number of arguments. Certainly, writing (or2 a (or2 b c)) is a bit of a pain. It'd be much better to simply write (or a b c). We can do this by using an ellipses in our patterns, as seen in this macro, called my-or to avoid clashing with the built in or:

(define-syntax my-or
  (syntax-rules ()
    ((_ e) e)
    ((_ e e* ...)
     (let ((t e))
       (if t t (my-or e* ...))))))

This macro uses two patterns. The first matches my-or with exactly one argument. The second matches my-or with one argument and any number of remaining arguments. The second pattern expands into a recursive call to my-or, but it decreases the number of arguments each time until we finally hit the base case. We might trace the expansion like this:

(my-or a b c)

=> 

(let ((t.1 a))
  (if t.1 t.1 (my-or b c)))
  
=>

(let ((t.1 a))
  (if t.1 t.1
      (let ((t.2 b))
        (if t.2 t.2 (my-or c)))))

=>

(let ((t.1 a))
  (if t.1 t.1
      (let ((t.2 b))
        (if t.2 t.2 c))))

Unfortunately, pattern matching and template instantiation become much trickier in the presence of ellipses. As we will see, however, when all is said and done we'll have added just one more cond clause to both the matcher and the instantiation function.

The basic idea is that we want to keep everything more or less the same, but when we encounter a ..., we want to recursively match or instantiate a pattern as many times as we can and then pick up where we left off. In my mind, the most complicated part of this becomes representing the environment between the matcher and instantiation code. Last time, we represented the environment as an association list. It's attractive to try to simply pair the pattern variable name with a list of things it matched instead of just a single value, but this doesn't seem to preserve quite enough information. Suppose we matched (my-or (+ 1 2) a b c) against (_ e e* ...), yielding the bindings ((e . (+ 1 2)) (e* . (a b c))). Then let's see what happens if for some reason we used this to instantiate the template '((e e*) ...). We might use a rule like "when instantiating a ... template, look up every variable in the environment and zip them together." This rule would lead to the following instantiation:

'((+ a) (1 b) (2 c))

This rule wouldn't be too hard to implement, but it's also probably not what we want. In this case, we've broken apart the (+ 1 2) expression into its individual components, when instead we'd probably like to keep this together. The semantics we'd like instead lead to the following result:

'(((+ 1 2) a) ((+ 1 2) b) ((+ 1 2) c))

Admittedly, this is a somewhat contrived example, but as we write more macros it will become apparent that this is usually the behavior we want.

In order to preserve the information we need, we'll represent the environment more like a tree, which includes a ... node that describes pattern variables bound under a .... Thus, if we matched (my-or (+ 1 2) a b c) against (_ e e* ...), we'd end up with the environment ((e . (+ 1 2)) (... ((e* . a)) ((e* . b)) ((e* .c)))). Each ... node in the environment includes a list of bindings matched under a ... pattern. The list of association lists approach allows us to keep track of which variables were matched together.

We will see that building this environment in the matcher is fairly straightforward, but more care is needed with template instantiation. We will define a flatten operation on environments, which takes in one environment, strips off a level of ellipses, and returns a list of environments. We then recursively apply the template instantiation function to each of these new environments. To look at our running example, ((e . (+ 1 2)) (... ((e* . a)) ((e* . b)) ((e* .c)))) would convert to this:

(((e . (+ 1 2)) (e* . a))
 ((e . (+ 1 2)) (e* . b))
 ((e . (+ 1 2)) (e* . c)))

It's easy to see now how instantiating '((e e*) ...) just requires mapping the instantiate function with (e e*) over this list of environments.

We're not quite done yet though. What happens if we had the pattern (_ (a ...) (b ...)), and matched this against ((1 2 3) (x y))? We'd end up with this environment:

((... ((a . 1)) ((a . 2)) ((a . 3)))
 (... ((b . x)) ((b . y))))

If we try to flatten this, we'll see that we won't have enough bs to correspond with each a. To avoid this problem, we will make the flatten procedure take a template as input, and only consider ... nodes that contain pattern variables referenced by the template. This way, we only expand the environment for either a or b, but not both.

So with all of this out of the way, we can look at an exension of match that handles ellipsis patterns:

(define (match* p e sk fk)
  (cond
    ((and (pair? p) (pair? (cdr p)) (eq? '... (cadr p)))
     (let loop ((e e)
                (b '()))
       (if (null? e)
           (match* (cddr p) e
                   (lambda (b^) (sk `((... . ,b) . ,b^)))
                   fk)
           (match* (car p) (car e)
                   (lambda (b^)
                     (loop (cdr e) (snoc b b^)))
                   (lambda ()
                     (match* (cddr p) e
                             (lambda (b^) (sk `((... . ,b) . ,b^)))
                             fk))))))
    ((and (pair? p) (pair? e))
     (match* (car p) (car e)
             (lambda (b)
               (match* (cdr p) (cdr e)
                       (lambda (b^) (sk (append b b^)))
                       fk))
             fk))
    ((eq? p '_)
     (sk '()))
    ((symbol? p)
     (sk (list (cons p e))))
    ((and (null? p) (null? e))
     (sk '()))
    (else (fk))))

This is almost identical to match from before, but we've added lines 3--16. These lines first check if the pattern is a ... pattern, and if so, we go into a loop where we try to match the pattern as many times as possible. As we do this, we build up a list of environments from each successful match. Once we get a failure, instead of calling the failure continuation we were given, we just continue on matching the rest of the pattern. We can try matching this against a let pattern as follows:

> (match* '(_ ((x e) ...) b) '(let ((x 5) (y 6)) (+ x y)) (lambda (b) b) (lambda () #f))
((... ((x . x) (e . 5)) ((x . y) (e . 6))) (b + x y))

Based on our previous discussion of environments, this is what we wanted.

Now let's look at how we combine a pattern with a template. We'll start with a helper, called extract-..., which pulls out the relevant ellipsis bindings from the environment:

(define (extract-... p bindings)
  (if (null? bindings)
      '()
      (let ((rest (extract-... p (cdr bindings)))
            (b (car bindings)))
        (if (and (eq? (car b) '...) (not (null? (cdr b))))
            (let ((names (map car (cadr b))))
              (if (ormap (lambda (x) (mem* x p)) names)
                  (cons (cdr b) rest)
                  rest))
            rest))))

This function works by walking over the set of bindings. When we encounter a ... node, we check if any of the variables bound in that sub-environment are referenced in the pattern (this is what the (ormap (lambda (x) (mem* x p)) names) clause does). If any variables are relevant, we include this sub-environment on the list that is returned.

Doing this relies on another helper, mem*, which determines whether a variable is relevant to a pattern. As its name suggest, this was originally a blind tree walk that checked if the symbol occurs anywhere in the pattern. Sadly, it's not that simple. Suppose we had the environment ((... ((a . 1)) ((a . 2)) ((a . 3))) (... ((b . x)) ((b . y)))) and we were instantiating the pattern ((a b ...) ...). Since there are two levels of ..., we only want to consider a on the first level, and then when we get to the second ..., we need to consider only b. This means we need to cut off search when we see a sub-pattern with an ellipsis. That's what lines 3 and 4 do in this code:

(define (mem* x ls)
  (cond
    ((and (pair? ls) (pair? (cdr ls)) (eq? (cadr ls) '...))
     #f)
    ((pair? ls)
     (or (mem* x (car ls)) (mem* x (cdr ls))))
    (else (eq? x ls))))

With our helper functions out of the way, we can look at the extended version of instantiate:

(define (instantiate* p bindings)
  (cond
    ((and (pair? p) (pair? (cdr p)) (eq? '... (cadr p)))
     (let ((bindings... (extract-... (car p) bindings)))
       (if (null? bindings...)
           (instantiate* (cddr p) bindings)
           (append
            (apply map (cons (lambda b*
                               (instantiate* (car p)
                                             (append (apply append b*)
                                                     bindings)))
                             bindings...))
            (instantiate* (cddr p) bindings)))))
    ((pair? p)
     (cons (instantiate* (car p) bindings)
           (instantiate* (cdr p) bindings)))
    ((assq p bindings) => cdr)
    (else p)))

One again, this includes the same code as before, extended with lines 3--13 to handle ellipses. This clause works by finding the relevant sub-environments. If it finds any, we append each of the sub-environments to the full environment and use each of these new environments to instantiate the sub-template.

We can pass this the results of the our match* call above, and see that it is able to reconstruct a let expression:

> (instantiate* '(let ((x e) ...) b)
                '((... ((x . x) (e . 5)) ((x . y) (e . 6))) (b + x y)))
(let ([x 5] [y 6]) (+ x y))

Of course, we can also use this to convert let expressions to function applications:

> (instantiate* '((lambda (x ...) b) e ...)
                '((... ((x . x) (e . 5)) ((x . y) (e . 6))) (b + x y)))
((lambda (x y) (+ x y)) 5 6)

And, we can handle some harder cases too:

> (instantiate* '((a b ...) ...)
                 '((... ((a . 1)) ((a . 2)) ((a . 3)))
                   (... ((b . x)) ((b . y)))))
((1 x y) (2 x y) (3 x y))

And here's my personal favorite:

> (instantiate* '((a a ...) ...) '((... ((a . 1)) ((a . 2)) ((a . 3)))))
((1 1 2 3) (2 1 2 3) (3 1 2 3))