10 Unhygienic Macros

2-0523.15

contents ← prev up next →

10 Unhygienic Macros

Recall the definition of a hygienic macro: definition-site binders do not capture use-site references, and use-site binders do not capture definition-site references. Hygienic macros can still implement binding forms (recall my-and-let, for example, from The Id (Identifier) Shape), but the bound names must be given as arguments.

Sometimes, though, it is useful for a macro to bind names that are visible to the macro use site without receiving the names as explicit arguments. Such macros are unhygienic; we also say that they “break hygiene”. Unhygienic macros are mainly divided into two groups; I’m going to call them clean unhygienic macros and unclean unhygienic macros, and you can’t stop me.

10.1 Clean Unhygienic Macros

A clean unhygienic macro defines names that are not given as Id arguments, but are based on one or more Id arguments.

A good example of a clean unhygienic macro is struct: it defines the predicate and accessor functions (as well as a few other names) based on the identifier given as the struct name and the identifers naming the fields. A greatly simplified version of struct could be given the following shape:

;; (struct s:Id (f:Id ...)) : Body[{s,s?,s-f...}]

As an example, let’s design a macro my-hash-view, which puts a struct-like interface on symbol-keyed hashes. It has the following shape:

;; (my-hash-view v:Id (f:Id ...)) : Body[{v,v?,v-f...}]

It should have the following behavior:

(my-hash-view point (x y))
; defines point, point?, point-x, point-y
(point 1 2)
; expect (hash 'x 1 'y 2)
(point? (hash 'x 3 'y 4))
; expect #t
(point? (hash 'x 3 'y 4 'z 5))
; expect #t
(point? (hash 'x 6))
; expect #f
(point-x (hash 'x 7 'y 8))
; expect 7

Let’s consider what code we could use to implement the intended behavior.

(begin
  (define (point x y)
    (hash 'x x 'y y))
  (define (point? v)
    (and (hash? v) (hash-has-key? v 'x) (hash-has-key? v 'y)))
  (define (point-x v)
    (unless (point? v)
      (raise-argument-error 'point-x "point?" v))
    (hash-ref v 'x))
  (define (point-y v)
    (unless (point? v)
      (raise-argument-error 'point-y "point?" v))
    (hash-ref v 'y)))

We need to produce the identifiers point?, point-x, and point-y. This code also has the string literal "point?"; we could compute it at run time (as we did in Designing Your First Macro), but in this example let’s go ahead and compute it at compile time. The other part of the code that is a bit tricky to produce is the body of the constructor function: (hash 'x x 'y y). The hash arguments do not consist of a single repeated term, but rather each repetition consists of two terms. Fortunately, Racket’s syntax templates support multi-term repetition using the ~@ template form.

Before we continue to the implementation of the macro, we can also use this hand-expansion to run our tests, to check that the expansion works before we automate its generation with the macro.

> (check-equal? (point 1 2) (hash 'x 1 'y 2))
> (check-pred point? (hash 'x 3 'y 4))
> (check-pred point? (hash 'x 3 'y 4 'z 5))
> (check-equal? (point? (hash 'x 6)) #f)
> (check-equal? (point-x (hash 'x 7 'y 8)) 7)
> (check-exn #rx"point-x: contract violation"
(lambda () (point-x (hash 'z 9))))

The tests pass, so let’s move on the the macro.

Given the identifier representing the use-site name point, how do we compute an identifier point? that acts like it also came from the macro use site? Using ordinary Racket functions we can compute the symbol 'point? given the symbol 'point. The extra step the macro must perform is to transfer the lexical context from the original point identifier to the new identifier. The primitive mechanism for doing that is datum->syntax: its first argument is an existing syntax object to take the lexical context from, and the second argument is a datum to wrap as the new syntax object. So the following is the process for computing the point? identifier from the point identifier:

(define point-id #'point)
(define point-symbol (syntax->datum point-id))
(define point?-symbol (string->symbol (format "~a?" point-symbol)))
(define point?-id (datum->syntax point-id point?-symbol))

The format-id automates this process. It takes the lexical context source object first, then a restricted format string (allowing only ~a placeholders), and then the format strings arguments. Unlike format, format-id automatically unwraps identifiers in the format string arguments to their symbol contents.

(define point?-id (format-id point-id "~a?" point-id))

Additionally, format-id with the #:subs? #t option builds the identifier with a syntax property (a way of attaching extra information to a syntax object) indicating the positions of the original identifier components. This information lets, for example, DrRacket draw binding arrows to parts of identifiers.

(define point?-id (format-id point-id "~a?" point-id #:subs? #t))

Finally, instead of using quasisyntax and unsyntax (#` and #,) to insert the results of compile-time computation into syntax templates, we can use #:with or with-syntax to bind secondary syntax pattern variables to the computed terms.

Here is the macro definition:

(define-syntax my-hash-view
  (syntax-parser
    [(_ name:id (field:id ...))
     #:with name? (format-id #'name "~a?" #'name #:subs? #t)
     #:with name?-string (format "~a?" (syntax->datum #'name)) ; implicit datum->syntax
     #:with (name-field ...) (for/list ([fieldname (in-list (datum (field ...)))])
                               (format-id #'name "~a-~a" #'name fieldname #:subs? #t))
     ; name? : Id, name?-string : Datum, (name-field ...) : (Id ...)
     #'(begin
         (define (name field ...)
           (hash (~@ (quote field) field) ...))
         (define (name? v)
           (and (hash? v) (hash-has-key? v (quote field)) ...))
         (define (name-field v)
           (unless (name? v)
             (raise-argument-error (quote name-field) (quote name?-string) v))
           (hash-ref v (quote field)))
         ...)]))

Let’s run the tests against the macro implementation:

; (my-hash-view point (x y)))
> (check-equal? (point 1 2) (hash 'x 1 'y 2))
> (check-pred point? (hash 'x 3 'y 4))
> (check-pred point? (hash 'x 3 'y 4 'z 5))
> (check-equal? (point? (hash 'x 6)) #f)
> (check-equal? (point-x (hash 'x 7 'y 8)) 7)
> (check-exn #rx"point-x: contract violation"
(lambda () (point-x (hash 'z 9))))

Exercise 24: The #:with name?-string binding in the definition above implicitly converts the string result of format into a syntax object. That’s okay, as long as we treat name?-string as a Datum. What happens if we treat it like an Expr instead? Find out by replacing (quote name?-string) with name?-string in the macro’s syntax template.

Exercise 25: Update the implementation of my-hash-view to allow field names to have different hash keys. That is, generalize the shape to the following:
;; (my-hash-view v:Id [fs:FieldSpec ...]) : Body[{v,v?,v-fs.fn...}]
;; where FieldSpec ::= fn:Id | [fn:Id #:key Datum]
Here is an example to illustrate the intended behavior:
(my-hash-view post (author [link #:key resource_href]))
(define post1 (hash 'author "Ryan" 'resource_href "/malr/unhygienic.html"))
(post-link post1) ; expect "/malr/unhygienic.html"
Hint: use the common meaning interface strategy.

Exercise 26: Update the implementation of my-hash-view so that the hash view name acts both as a constructor and as a match pattern name. That is, the hash view name should be statically bound to a compile-time struct implementing both the procedure interface and the match expander interface. You should define the actual constructor function with a different name and expand to it using make-variable-like-transformer. For the match expander, use the ? and app match pattern forms. That is, as a match pattern, point behaves as follows:
(point x-pat y-pat)
⇒
(? point? (app point-x x-pat) (app point-y y-pat))

Exercise 27 (★): Update your solution to Exercise 26 to also support hash view extension (or “subtyping”). That is, the value statically bound to hash-view name must support three interfaces: the procedure interface, the match expander interface, and a private interface that carries enough information to support view extension.
Here are some examples to illustrate the expected behavior:
(my-hash-view point (x y))
(my-hash-view point3 #:super point (z))
(define p3 (point3 1 2 3))
(point? p3) ; expect #t
(point3? p3) ; expect #t
(point-x p3) ; expect 1
(point3-z p3) ; expect 3
(match p3 [(point x y) (+ x y)]) ; expect 3
(match p3 [(point3 x y z) (+ x y z)]) ; expect 6

10.2 Unclean Unhygienic Macros

An unclean unhygienic macro defines names that are not based on any Id arguments.

The canonical example of an unclean unhygienic macro is a while loop that binds the name break to an escape continuation to exit the loop.

What lexical context should the macro use to create the break binder? The best candidate here is the lexical context of the whole macro use. In a syntax-parser form, this is available through the name this-syntax. (You might wonder whether this-syntax is bound unhygienically. It isn’t. In fact, we’ll talk about the mechanism it uses in Syntax Parameters.)

Here is the macro definition:

; (while Expr Expr{break} ...+) : Expr
(define-syntax while
  (syntax-parser
    [(_ condition:expr loop-body:expr ...+)
     #:with break (datum->syntax this-syntax 'break)
     #'(let/ec break
         (let loop ()
           (when condition
             loop-body ...
             (loop))))]))

With this macro, we can finally write FORTRAN in Racket:

> (define ns '(2 3 4 5 6))
> (define sum 0)
> (while (pair? ns)
    (when (integer? (sqrt (car ns))) (break))
    (set! sum (+ sum (car ns)))
    (set! ns (cdr ns)))
> sum
5

Now let’s write the macro forever that uses while as the helper macro. That is:

(forever loop-body)
⇒
(while #t loop-body)

It should be trivial, right? Here’s a definition:

; (forever Expr{break} ...+) : Expr
(define-syntax forever
  (syntax-parser
    [(_ loop-body:expr ...+)
     #'(while #t loop-body ...)]))

But if we try to use break in the loop body, this happens:

> (define counter 0)
> (forever
    (set! counter (add1 counter))
    (unless (< counter 5) (break))
    (printf "counter = ~s\n" counter))
counter = 1
counter = 2
counter = 3
counter = 4
break: undefined;
cannot reference an identifier before its definition
  in module: top-level

In a module, this wouldn’t even compile, because break is unbound.

What went wrong? Here is one explanation: The forever example expands into a use of while, which expands into code that binds break with the lexical context of the while expression. But the lexical context of the while expression is from the definition site of forever, not the use site of forever in the example! Given that those are not necessarily the same, there’s no reason to expect the example to work.

On the other hand, it’s not clear what makes the two sites different, either. What is a “site”, anyway? The definition of forever and the example use of forever are both top level interactions (of this Scribble document’s evaluator, specifically); what makes them distinct?

We need to refine our definition of hygiene slightly. Each time a macro is invoked, it is considered to have a different “site”. More precisely, the meaning of references in the macro’s syntax template is determined by its definition site, but an extra marker is added that distinguishes binders introduced by different macro invocations. In the terminology of Racket’s hygiene model, this extra marker is called a macro-introduction scope.

We can “fix” the implementation of forever by adjusting the lexical context on the syntax object representing the use of the while macro (but not on any of its subterms) to be the same as the use of the forever macro. We do that by using syntax-e to unwrap just the outer layer of syntax, and then we use datum->syntax to rebuild it with the lexical context of this-syntax. Here is the implementation:

; (forever Expr{break} ...+) : Expr
(define-syntax forever
  (syntax-parser
    [(_ loop-body:expr ...+)
     (define code
       #'(while #t loop-body ...))
     (datum->syntax this-syntax (syntax-e code))]))

Now the example works:

> (define counter 0)
> (forever
    (set! counter (add1 counter))
    (unless (< counter 5) (break))
    (printf "counter = ~s\n" counter))
counter = 1
counter = 2
counter = 3
counter = 4

With this approach, break is visible to the loop body (well, assuming that the loop body terms have the same lexical context as the term representing the whole call to forever, which is not necessarily true), but it is not visible to the code introduced by the forever macro.

Here’s another approach that works if we want to use break in the macro as well as making it visible to the loop body:

; (do-while Expr Body{break} ...+) : Expr
(define-syntax do-while
  (syntax-parser
    [(_ condition:expr loop-body:expr ...+)
     #:with break/user (datum->syntax this-syntax 'break)
     #'(while #t
         (let ([break/user break])
           loop-body ...)
         (unless condition (break)))]))

Lesson: Unhygienic macros are difficult to use as helper macros — that is, as the targets of expansion.

10.3 Optionally-Hygienic Macros

Consider Racket’s require form. For example,

(require racket/list)

locally binds the names first, second, and so on, even though those names are not given as binder Id arguments to require. In fact, require is acting as an unclean unhygienic binding form here — its argument, racket/list is an identifier, but require’s argument shape is RequireSpec, which has a ModulePath variant, which has an identifier variant. We could also consider (require (lib "racket/list.rkt")), which means the same thing.

On the other hand, in the following,

(require (only-in racket/list first [last final]))

the first identifier is used for the binding of the first import, and the final identifier is used for the binding of the import that racket/list exports as last. So this particular usage of require is hygienic!

One way to mitigate the difficulty that unhygienic macros cause is to give them hygienic options. For example, we could extend while with an optional clause for specifying the name to bind to the escape continuation. If the clause is present, the macro binds the given name, and it is hygienic; if the clause is absent, it generates the name unhygienicially. Here is the optional clause shape:

;; MaybeBreakClause ::= ε | #:break Id

Instead of defining a (splicing) syntax class for it, though, let’s just handle it inline within the macro’s syntax pattern using the ~optional pattern form. If an ~optional pattern is absent, then all of its pattern variables are bound to the value #f (note: not the syntax object representing the term #f). Normally, only syntax-valued attributes can be used within syntax templates, but the template form ~? can dynamically “catch” false-valued attributes in its first sub-template and fall back to its second sub-template. We can define the macro as follows:

; (while Expr MaybeBreakClause Expr{break} ...+) : Expr
(define-syntax while
  (syntax-parser
    [(_ condition:expr (~optional (~seq #:break break-name:id))
        loop-body:expr ...+)
     #:with default-break (datum->syntax this-syntax 'break)
     #'(let/ec (~? break-name default-break)
         (let loop ()
           (when condition
             loop-body ...
             (loop))))]))

Here is an example:

> (define n 2022)
> (while #t #:break stop
    (cond [(= n 1) (printf "\n") (stop)]
          [(even? n) (printf "⌄") (set! n (quotient n 2))]
          [(odd? n) (printf "⌃") (set! n (add1 (* n 3)))]))
⌄⌃⌄⌃⌄⌄⌄⌃⌄⌄⌃⌄⌃⌄⌄⌃⌄⌄⌃⌄⌄⌃⌄⌃⌄⌃⌄⌃⌄⌄⌄⌄⌄⌄⌃⌄⌃⌄⌄⌃⌄⌄⌃⌄⌄⌄⌄⌃⌄⌃⌄⌃⌄⌄⌃⌄⌄⌄⌃⌄⌄⌄⌄

Here is the equivalent definition with a separate syntax class:

(begin-for-syntax
  (define-splicing-syntax-class maybe-break-clause
    #:attributes (break-name) ; (U #f Syntax[Id])
    (pattern (~seq #:break break-name:id))
    (pattern (~seq) #:attr break-name #f)))

; (while Expr MaybeBreakClause Expr{break} ...+) : Expr
(define-syntax while
  (syntax-parser
    [(_ condition:expr bc:maybe-break-clause
        loop-body:expr ...+)
     #:with default-break (datum->syntax this-syntax 'break)
     #'(let/ec (~? bc.break-name default-break)
         (let loop ()
           (when condition
             loop-body ...
             (loop))))]))

10.4 Syntax Parameters

Another alternative to unclean unhygienic macros is to define a single name that takes on different meanings in different contexts. This is analogous to run-time parameter values, so the feature is called a syntax parameter.

contents ← prev up next →

1	Introduction
2	Terms and Shapes
3	Basic Shapes
4	Compound Shapes
5	Shape Definitions
6	Enumerated Shapes
7	Multi-Term Shapes
8	Recursive Shapes
9	Compile-Time Computation and Information
10	Unhygienic Macros

10.1	Clean Unhygienic Macros
10.2	Unclean Unhygienic Macros
10.3	Optionally-Hygienic Macros
10.4	Syntax Parameters