On this page:
8.1 The Datum Shape
8.2 Datum as a Recursive Shape
8.3 Quasiquotation
2-0523.15

8 Recursive Shapes

8.1 The Datum Shape

The Datum shape contains all number terms, identifier terms, and other atomic terms, as well as all list, vector, hash, box, and prefab struct terms containing Datum elements. That is, Datum contains any term the corresponds to a readable value.

The Datum shape represents the intention to use the term as a literal within a quote expression, or to convert it to a compile-time value using syntax->datum.

There is no syntax class corresponding to Datum.

Let’s design the macro my-case1, which is like my-evcase1 from Exercise 9 and Same Structure, Different Interpretation except that each clause’s comparison value is given as a datum rather than an expression. That is, the macro’s shape is:
;; (my-case1 Expr [Datum Expr] ...) : Expr
Here is an example:
(my-case1 (begin (printf "got a coin!\n") (* 5 5))
  [5 "nickel"] [10 "dime"] [25 "quarter"])
; expect print once, "quarter"
Here is an implementation:
; (my-case1 Expr [Datum Expr] ...) : Expr
(define-syntax my-case1
  (syntax-parser
    [(_ to-match:expr [datum result:expr] ...)
     #'(let ([tmp to-match])
         (cond [(equal? tmp (quote datum)) result] ...))]))
I often spell out quote in a syntax template when it is applied to a term containing pattern variables, to remind myself that the quoted “constant” can vary based on the macro’s arguments.

Here is another implementation:
; (my-case1 Expr [Datum Expr] ...) : Expr
(define-syntax my-case1
  (syntax-parser
    [(_ to-match:expr [datum result:expr] ...)
     #'(let ([tmp to-match])
         (cond [(equal? tmp datum) result] ...))]))
This implementation is wrong, because the Datum arguments are not used within a quote expression. Never implicitly treat a Datum as an Expr! One obvious problem is that not every datum is self-quoting. The following example should return "matched", but it raises an error instead:
> (my-case1 (list 123)
    [(123) "matched"])

application: not a procedure;

 expected a procedure that can be applied to arguments

  given: 123

Even a datum that is normally self-quoting can carry a lexical context with an alternative #%datum binding that gives it some other behavior. For example:
> (let-syntax ([#%datum (lambda (stx) (raise-syntax-error #f "no self-quoting!" stx))])
    (my-case1 '2 [1 'one] [2 'two] [3 'lots]))

eval:7:0: #%datum: no self-quoting!

  in: (#%datum . 1)

This particular example is admittedly uncommon. A more common problem is that the datum is computed by a macro, and depending on how it is coerced to a syntax object it may or may not get a lexical context with Racket’s #%datum binding. Avoid all of these problems by treating Datum and Expr as distinct shapes. If you have a datum and you want an expression that evaluates to the same value at run time, put the datum in a quote expression.

Exercise 17: Generalize my-case1 to my-case, which has a list of datums in each clause. That is, the macro has the following shape:
;; (my-case Expr [(Datum ...) Expr] ...) : Expr

8.2 Datum as a Recursive Shape

Could we write a definition of Datum rather than treating it as a basic (that is, primitive) shape? The full definition would be quite complicated, since Racket has many kinds of readable values, and it occasionally adds new ones. Let’s simplify the question to datum terms built from identifiers, numbers, booleans, strings, and proper lists; let’s call this shape SimpleDatum. We can define it as a recursive shape as follows:
;; SimpleDatum ::= (SimpleDatum ...) | SimpleAtom
;; SimpleAtom  ::= Identifier | Number | Boolean | String
I have collected the base cases into a separate shape, SimpleAtom, for convenience.

Like the corresponding shapes, the simple-datum syntax class is recursive; the simple-atom syntax class is not. Let’s discuss simple-atom first.

The simple-atom syntax class presents a challenge: There is a built-in syntax class for identifiers (id), but how do we check whether a term contains a number, a boolean, or a string? Given a syntax object, we can extract the corresponding plain Racket value by calling syntax->datum. Then we can use the ordinary number?, boolean?, and string? predicates. An identifier is just a syntax object containing a symbol, so we can cover the identifier case by adding a symbol? predicate check to the others. Finally, we perform this check using a ~fail pattern; if the check fails, then the syntax class does not accept the term. Here is the definition:
(begin-for-syntax
  ; simple-atom? : Any -> Boolean
  (define (simple-atom? v)
    (or (symbol? v) (number? v) (boolean? v) (string? v)))
  (define-syntax-class simple-atom
    #:attributes ()
    ; (pattern a #:when (simple-atom? (syntax->datum #'a)))
    (pattern a #:and (~fail #:unless (simple-atom? (syntax->datum #'a))))))
In most cases, it is better to use #:when to perform checks like this, but this is one of the few cases where we don’t want a “post-traversal” check that dominates other matching failures. The difference between the two only affects the way errors are reported.

The simple-datum syntax class is straightforward. The recursive case in the shape simply translates to a syntax pattern with recursive syntax class annotations:
(begin-for-syntax
  (define-syntax-class simple-datum
    #:attributes ()
    (pattern (elem:simple-datum ...))
    (pattern atom:simple-atom)))

There are no attributes, because the only interpretation that SimpleDatum supports can be achieved with quote or syntax->datum on the term itself.

8.3 Quasiquotation

Let’s define my-quasiquote, a simple version of Racket’s quasiquote macro. Its argument has a shape like SimpleDatum, except that it can have “escapes” to Racket expressions so we can compute values to insert into the result. The shape of the macro is the following:
;; (my-quasiquote QuasiDatum) : Expr
where QuasiDatum is defined as follows:
;; QuasiDatum ::= (escape Expr)
;;              | (QuasiDatum ...)
;;              | SimpleAtom
What does escape mean in this shape definition? That is, how do we recognize the escape form of QuasiDatum. There are two possibilities: we could recognize escape either as a symbolic literal or as a reference literal. For this example, we’ll recognize escape by reference; we’ll show an example of symbolic literals later in FIXME-REF.

Recognizing escape as a reference means that there must be a binding of escape for the reference to refer to. Since we don’t intend escape to have any meaning as a Racket expression, we should define it as a macro that always raises a syntax error:
(define-syntax escape
  (lambda (stx) (raise-syntax-error #f "illegal use of escape" stx)))

The error only occurs if the macro is expanded by the Racket macro expander; our macros and syntax classes can still recognize references to it without triggering the error. Note that the module that provides my-quasiquote must also provide escape so that users of my-quasiquote can refer to this escape binding.

Now we can implement the quasi-datum syntax class. We declare escape as a (reference) literal using #:literals; then occurrences of escape in the syntax patterns are treated as literals instead of as pattern variables. Here is the definition, without attributes:
(begin-for-syntax
  (define-syntax-class quasi-datum
    #:literals (escape)
    (pattern (escape code:expr))
    (pattern (elem:quasi-datum ...))
    (pattern a:simple-atom)))

What interface should we give to the quasi-datum syntax class? Recall the interface strategies from Defining Enumerated Shapes. Most of them are applicable here; with the possible exception of the common meaning approach. Let’s use the macro behavior strategy. The my-quasiquote interprets QuasiDatum as instructions to construct a value from a mixture of constants and values computed by escaped Racket expressions. We can represent that with a syntax attribute, code, containing an expression that produces the QuasiDatum’s value. Here is the definition:
(begin-for-syntax
  (define-syntax-class quasi-datum
    #:attributes (code) ; Expr
    #:literals (escape)
    (pattern (escape code:expr))
    (pattern (elem:quasi-datum ...)
             #:with code #'(list elem.code ...))
    (pattern a:simple-atom
             #:with code #'(quote a))))
That is, the escape form contains the necessary expression directly; the list form constructs a list from the values constructed by its components; and an atom is interpreted as a value by quoting it.

The macro simply expands into its argument’s code expression:
(define-syntax my-quasiquote
  (syntax-parser
    [(_ qd:quasi-datum)
     #'qd.code]))

Here are some examples:
> (my-quasiquote (1 2 () abc xyz))

'(1 2 () abc xyz)

> (my-quasiquote (1 2 (escape (+ 1 2))))

'(1 2 3)

> (my-quasiquote ((expression (+ 1 2)) (value (escape (+ 1 2)))))

'((expression (+ 1 2)) (value 3))

> (my-quasiquote (a (b (c (d (e (f (escape (string->symbol "g")))))))))

'(a (b (c (d (e (f g))))))

Because escape is recognized by reference, it can be made unavailable by shadowing, or it can be aliased to another name:
> (let ([escape 'piňa-colada])
    (my-quasiquote (1 2 (escape (+ 1 2)))))

'(1 2 (escape (+ 1 2)))

> (let-syntax ([houdini (make-rename-transformer #'escape)])
    (my-quasiquote ((houdini 'jacket) (houdini 'water-tank))))

'(jacket water-tank)

The behavior of this example is questionable, though:
> (my-quasiquote (1 2 (escape 3 4)))

'(1 2 (escape 3 4))

Do we want (escape 3 4) to be interpreted as a three-element list, or do we want to consider it an ill-formed escape form and report an error? I prefer to consider it an error. That is, whenever the quasi-datum parser gets a term that starts (escape ...), it should commit to parsing it according to the escape pattern. This problem is similar to the problem with => clauses in Designing Enumerated Syntax. The solution is the same too: We must put the escape pattern before any other pattern it might overlap with, and we must add a cut (~!) immediately after the escape literal. Here is the updated code:
(begin-for-syntax
  (define-syntax-class quasi-datum
    #:attributes (code) ; Expr
    #:literals (escape)
    (pattern (escape ~! code:expr))
    (pattern (elem:quasi-datum ...)
             #:with code #'(list elem.code ...))
    (pattern a:simple-atom
             #:with code #'(quote a))))

Now the example signals an error instead:
> (my-quasiquote (1 2 (escape 3 4)))

eval:25:0: my-quasiquote: unexpected term

  at: 4

  in: (my-quasiquote (1 2 (escape 3 4)))

  parsing context:

   while parsing quasi-datum

    term: (escape 3 4)

    location: eval:25:0

   while parsing quasi-datum

    term: (1 2 (escape 3 4))

    location: eval:25:0

There’s one remaining issue with this implementation. Consider the following example:
> (my-quasiquote (1 2 3 4 5))

'(1 2 3 4 5)

This example has no escapes, so its result could be implemented with a simple quote expression. But this is how the macro expands:
(my-quasiquote (1 2 3 4 5))
(list (quote 1) (quote 2) (quote 3) (quote 4) (quote 5))

Let’s optimize my-quasiquote so that it uses quote at the highest levels possible. Here’s one strategy: we add an boolean-valued const? attribute that is true when a term has no escapes. A list QuasiDatum is constant if all of its elements are constant; in that case, its code can be simply the quotation of the elements. Here is the updated syntax class:
(begin-for-syntax
  (define-syntax-class quasi-datum
    #:attributes (const? ; Boolean
                  code)  ; Expr
    #:literals (escape)
    (pattern (escape code:expr)
             #:attr const? #f)
    (pattern (elem:quasi-datum ...)
             #:attr const? (andmap values (datum (elem.const? ...)))
             #:with code (if (datum const?)
                             #'(quote (elem ...))
                             #'(list elem.code ...)))
    (pattern a:simple-atom
             #:attr const? #t
             #:with code #'(quote a))))

Exercise 18: Implement quasi-datum and my-quasiquote (the unoptimized version) according to the empty interface strategy and a recursive my-quasiquote macro. Is it possible to implement the quote optimization using this approach?

Exercise 19: Extend the definition of QuasiDatum as follows:
;; QuasiDatum ::= ... | (cellophane QuasiDatum)
A cellophane wrapper is simply discarded from the constructed value; that is, the QuasiDatum (cellophane qd) is equivalent to qd. For example:
(my-quasiquote (1 2 (cellophane 3) (escape (* 2 2))))
; expect '(1 2 3 4)
(my-quasiquote (1 (cellophane 2) (cellophane (cellophane 3))))
; expect '(1 2 3)

Start with the unoptimized version of the quasi-datum syntax class using the macro behavior strategy. After you have updated (and tested) that version, implement a similar optimization for the updated QuasiDatum shape. For example, the second example above should expand directly to (quote (1 2 3)). (Hint: What assumption made by the original optimization does the updated shape violate?)