One Racketeer

the blog of Ryan Culpepper

(blog-of ryanc@racket-lang.org)

oblivious normalization by evaluation

At RacketCon 2018’s office hours, David Thrane Christiansen talked about “normalization by evaluation” as a prelude to his “Implementing Dependent Types” session.

In the design he sketched, the normalizer needed help from the evaluator. I wondered if it was possible to use a standard (oblivious) evaluator, such as Racket’s normal eval. The answer is no — but kind of.

Normal forms

To be precise, we are working with the following language:

  EXP = X
  | (lambda (X) EXP)
  | (EXP EXP)
     
  X = variable name

A normal form (specifically, a β-normal form) is a term that does not contain any β-redexes. That is, it does not contain any subterm that the β reduction rule can be applied to:

((lambda (X) EXP₁) EXP₂)  EXP₁[EXP₂ / X]  ; β

where the notation _ [_ / _] means capture-avoiding substitution.

The following terms are examples of normal forms:
(lambda (x) x)
(lambda (f) (lambda (x) (f (f (f x)))))
The following is not a normal form, even though it is a value (we’ll get back to values in a moment):

(lambda (x) ((lambda (y) y) (lambda (z) (z z))))

But the term above has a normal form (and our normalizer should be able to find it):

(lambda (x) (lambda (z) (z z)))

We consider a term to have a normal form if we can convert it into a normal form by applying the β rule any number of times, anywhere in the term (even within lambda-abstractions).

The following is not a normal form and does not have a normal form:

((lambda (x) (x x)) (lambda (x) (x x))) ; usually called Ω

Normal forms have the following shape, described by the NF nonterminal:
  NF = NEU
  | (lambda (X) NF)
     
  NEU = X
  | (NEU NF)
A normal form is either a neutral term (NEU) or a lambda-abstraction containing a normal form. A neutral is either a variable or a neutral applied to a normal form. In other words, a neutral term is a normal form that is not a lambda abstraction.

Note: In this post, whenever I say “normal form”, I always mean β-normal form; and whenever I say “has a normal form”, I mean it has a β-normal form reachable through β steps. That is, even once I introduce other kinds of steps, the idea of having a normal form is tied to β; when I talk about whether the normal form is reachable through other kinds of steps, I’ll say so explicitly.

CBN Evaluation

If we consistently apply the β rule at the leftmost, outermost place that it can be applied, except within a lambda body, then we get call-by-name (CBN) evaluation.

Here’s a simple CBN evaluator:

; evaluate : EXP → EXP
(define (evaluate e)
  (match e
    [`(,e1 ,e2)
     (match (evaluate e1)
       [`(lambda (,x) ,e2)
        (evaluate (subst e2 x e1))]
       [e1* ; open term!
        `(,e1* ,e2)])]
    [e e]))

Note that the evaluator allows open terms. For example, (evaluate 'x) produces 'x. Also note that the evaluator diverges when given a term like Ω. If the original term is closed, then the second inner match case is impossible, and the evaluator either returns a lambda-abstraction or it diverges.

Given the CBN evaluator above, it is straightforward to write a normalizer that takes a term and produces a normal form (if the term has one). We start by evaluating the term, then we recur into its immediate subexpressions and normalize them.

; normalize : EXP → NF
(define (normalize e)
  (match (evaluate e)
    [`(lambda (,x) ,e)
     `(lambda (,x) ,(normalize e))]
    [`(,e1 ,e2)
     `(,(normalize e1) ,(normalize e2))]
    [x x]))

CBV Evaluation

Evaluation in Racket, however, is not based on the β rule, but on a restricted form called βv (“β-value”):

((lambda (X) EXP₁) VAL₂)  EXP₁[VAL₂ / X]  ; βv

where a value (VAL) is either a variable or a lambda-abstraction:
  VAL = X
  | (lambda (X) EXP)
If we apply βv at the leftmost, outermost place that it can be applied, except within a lambda body, then we get call-by-value (CBV) evaluation. (The terms “call-by-name” and “call-by-value”, as evaluation strategies, are unrelated to the terms “call (or pass) by value” and “call (or pass) by reference” used in the context of languages like C, C++, Java, etc.)

Given these two differences, is it possible to build a β-normalizer out of a CBV-evaluator?

No. Not one that uses the evaluator in any meaningful way, anyway. Consider the following term:

((lambda (z) (lambda (y) y)) Ω)

where Ω is that example from earlier of a term without a normal form. This term has a normal form: (lambda (y) y).

If we apply the evaluator to this term, it diverges (that is, it fails to terminate). If we do case analysis first and apply the evaluator to the subterms, it diverges on the second. We could do deeper case analysis, but at that point, we’re just building a normalizer from scratch.

The problem is that while this example has a normal form (reachable in one β step), that normal form is not reachable through any number of βv steps — it gets trapped by Ω. That shouldn’t be surprising: βv is weaker than β.

Un-evaluation

Let’s refine the question. If there is a normal form reachable through βv steps, can we find it using a CBV evaluator? Racket’s eval procedure, for example?

Now we must reckon with another issue: Racket’s eval takes an encoding of a term (as an S-expression made of lists and symbols) and produces a Racket value, and Racket provides no way to convert a function value back to a term encoding. But perhaps we could build such an “un-evaluator”, at least if we’re allowed to assume that the value we’re un-evaluating corresponds to a normal form.

So let’s try to build a normalizing un-evaluator, with the following signature:

; un-eval : eval(NF) -> NF

That is, un-eval must be given something that is (or is “equivalent” to) the result of calling eval on some (closed) normal form, and it will return an equivalent normal form. Of course, un-eval might choose different variable names. For example, we might have the following:
> (un-eval (eval '(lambda (z) z)))

'(lambda (x1) x1)

> (un-eval (lambda (z) z))

'(lambda (x1) x1)

The only possible value we can get from evaluating a closed normal form is an ordinary Racket function, and the only thing we can do with that is to apply it to an argument. So we should apply the function to some sort of value that will allow us to map out its behavior.

For example, suppose we apply the function to the symbol 'a and we get back 'a. Then we know that the function corresponds to the normal form (lambda (x) x) remember, we’re assuming that the function is the result of evaling some term in the language above, so we don’t consider possibilities like (lambda (x) 'a) or (lambda (x) (if (symbol? x) x 12)). But what if the function is like (lambda (x) (lambda (y) x)) and doesn’t return the symbol immediately? Or worse, what if the function is like (lambda (x) (x x)) and tries to apply its argument? Sending a symbol is not a good strategy.

We need to send a probe that acts like a function, but that we can recognize and turn into a term encoding. Racket gives us many ways to create such functions; here’s one approach using applicable structures:
(struct probe (term)
  #:property prop:procedure
  (lambda (self arg)
    (probe `(,(probe-term self) ,(un-eval arg)))))

That is, a probe instance supports the operation probe-term that returns an S-expression representation of the term, but it also acts as a procedure of one argument. When applied, a probe just creates a new probe whose term is an application term where the operator is the original probe’s term and the operand is a term representing the argument. We convert the argument to a term using un-eval, which we are about to define.

The un-eval function must handle two cases: ordinary Racket functions that we want to un-eval, and probes that we have introduced to explore the first kind of function. Since we can recognize probes with the probe? predicate, we can implement un-eval by case analysis:

; un-eval : eval(NF) -> NF
(define (un-eval p)
  (if (probe? p)
      (probe-term p)
      (let ([var (gensym)])
        `(lambda (,var) ,(un-eval (p (probe var)))))))

That is, to un-eval a probe, we just extract its term using probe-term. To un-eval another kind of function, we first generate a fresh variable name, then we return a lambda expression with the fresh variable and a body constructed by probing the given function.

We can now see how un-eval corresponds to the NF grammar. A probe always carries a neutral term (NEU), and un-eval returns either a neutral term (case 1) or a lambda-abstraction whose body is a normal form (case 2).

> (un-eval (lambda (z) z))

'(lambda (x1) x1)

> (un-eval (lambda (x) (lambda (y) (x (lambda (z) ((z y) x))))))

'(lambda (x1) (lambda (x2) (x1 (lambda (x3) ((x3 x2) x1)))))

See this code for a slightly nicer implementation and more examples, including Church numerals and a factorial function defined via the Z, the call-by-value fixed-point combinator. For example, we have the following:
> (un-eval THREE)

'(lambda (x1) (lambda (x2) (x1 (x1 (x1 x2)))))

> (un-eval (fact THREE))

'(lambda (x1) (lambda (x2) (x1 (x1 (x1 (x1 (x1 (x1 x2))))))))

Note that Z, fact, and recursive functions in general do not have normal forms, but (fact THREE) is just a Church numeral, and it has the normal form above (the Church encoding of 6).

Sadly, un-eval cannot always find a normal form, even when one exists. For example, it diverges on the following term:
((lambda (z) (lambda (y) y))
 ((lambda (x) (x x)) (lambda (x) (x x))))
This term has the normal form (lambda (y) y), but the normal form is not reachable via βv steps.

So perhaps we can characterize un-eval with the following statement: if EXP has a β-normal form NF reachable by a sequence of βv steps, then (un-eval (eval (quote EXP))) produces (quote NF) (modulo α-renaming).

Beyond βv

But wait! Here’s an interesting term that takes a function f, applies it to itself, and then discards the result and just returns f:
(define interesting
  (lambda (f)
    ((lambda (y) f) (f f))))
or equivalently, with a little clarifying syntactic sugar:
(define interesting
  (lambda (f)
    (let ([y (f f)])
      f)))
This term has the β-normal form (lambda (f) f), but the normal form is not reachable through βv steps.

It’s worth emphasizing here that while interesting and (lambda (f) f) are equivalent in the λ-calculus (generated by β), they are not equivalent in the λᵥ-calculus (generated by βv). They must not be, because under CBV evaluation the two terms act differently! For example, (interesting (lambda (x) x)) evaluates to (lambda (x) x) and (interesting (lambda (x) (x x))) diverges.

But here’s a surprise: un-eval calculates the β-normal form for this term:
> (un-eval interesting)

'(lambda (x1) x1)

But how?

Our probing strategy has turned the inner (f f) into a value at the Racket level, even though it is not a value at the term level. In general, any (closed) neutral term is represented by a Racket value (a probe instance). So effectively we’ve added another notion of reduction to our evaluator:

((lambda (X) EXP₁) NEU₂)  EXP₁[NEU₂ / X] ; βneu

If our purpose were to faithfully follow the restrictions of βv reduction, this should be alarming. But this whole adventure started with the question of whether CBV-evaluation can simulate (or at least approximate) β-normalization, so let’s see how far we can push it.

Recall the definition of neutral term (NEU):
  NEU = X
  | (NEU NF)
Probes correspond to neutral terms: a base probe is created to represent a variable when we create a lambda abstraction; and when we extend a probe through application, we normalize the argument.

What do we lose by eagerly normalizing the argument?

Here’s a weird term that takes a function f, applies it to a value that doesn’t have a normal form, throws away that result, and then just returns f.
(define weird
  (lambda (f)
    ((lambda (z) f)
     (f (lambda (y) ((lambda (x) (x x)) (lambda (x) (x x))))))))
This term has the normal form (lambda (f) f), but it is not reachable through βv steps. And indeed, un-eval diverges — but only because it tries calculating the normal form of the value passed to f, even though that result is discarded.

This example suggests a modification to our probe representation: delay the normalization of arguments until the probe’s term is needed for the body of a lambda-abstraction. Here are the updated definitions:

; A Probe is (probe (promise-of Neutral))
(struct probe (term-p)
  #:property prop:procedure
  (lambda (self arg)
    (probe (delay `(,(probe-term self) ,(un-eval arg))))))
 
; probe-term : Probe -> Neutral
(define (probe-term p) (force (probe-term-p p)))
 
; un-eval+ : eval(NF) -> NF
(define (un-eval+ p)
  (if (probe? p)
      (probe-term p)
      (let ([var (gensym*)])
        `(lambda (,var) ,(un-eval (p (probe (delay var))))))))

The new un-eval+ agrees with the old version on the examples where the old one worked:
> (un-eval+ (lambda (z) z))

'(lambda (x1) x1)

> (un-eval+ (lambda (x) (lambda (y) (x (lambda (z) ((z y) x))))))

'(lambda (x1) (lambda (x2) (x1 (lambda (x3) ((x3 x2) x1)))))

> (un-eval+ THREE)

'(lambda (x1) (lambda (x2) (x1 (x1 (x1 x2)))))

> (un-eval+ (fact THREE))

'(lambda (x1) (lambda (x2) (x1 (x1 (x1 (x1 (x1 (x1 x2))))))))

And the new version of un-eval reports the “expected” normal form for weird:
> (un-eval+ weird)

'(lambda (x1) x1)

Now a probe instance corresponds to a term like a neutral term but with value arguments (NVAL):
  NVAL = X
  | (NVAL VAL)
and un-eval+ effectively implements a new notion of reduction:

((lambda (X) EXP₁) NVAL₂)  EXP₁[NVAL₂ / X] ; βnval

I conjecture (but haven’t proven) that un-eval+ satisfies the following property: If EXP has a β-normal form NF reachable by a sequence of βv and βnval steps, then (un-eval (eval (quote EXP))) produces (quote NF) (modulo α-renaming).