2 Specifying and Validating Syntax

6.0.1

2 Specifying and Validating Syntax

2.1 Basic Syntax Validation

Let’s revisit the andlet1 macro from Binding Forms. The definition we gave then was

(define-syntax-rule (andlet1 var e1 e2)
(let ([var e1])
(if var e2 #f)))

The problem with this definition is that the macro does not validate it’s syntax; it does not check that var is an identifier.

> (andlet1 "not an id" #t #t)
eval:2:0: let: bad syntax (not an identifier)
at: "not an id"
in: (let (("not an id" #t)) (if "not an id" #t #f))

Writing macros that validate their syntax requires something more powerful than define-syntax-rule. There are a couple different options, but for now we’ll skip straight to the most advanced one: syntax-parse.

The definition of andlet1 using syntax-parse looks like this:

(require (for-syntax racket/base syntax/parse))

(define-syntax andlet1
  (lambda (stx)
    (syntax-parse stx
      [(_ var e1 e2)
       #:declare var identifier
       #'(let ([var e1])
           (if var e2 #f))])))

This definition reveals a few new issues that I’ll mention briefly now; we’ll discuss them in more detail later in this guide.

We must import racket/base and syntax/parse using for-syntax. This allows us to use those libraries in compile-time code.
A macro is defined via define-syntax with a right-hand side that is a compile-time function—note the lambda in the definition above. The function implements the translation of terms representing uses of the macro to their expanded forms.
Unlike define-syntax-rule and syntax-rules macros, in syntax-parse a syntax template is explicitly marked with #'. The term #'term is just reader notation for (syntax term).
The pattern starts with _ instead of the macro name andlet1.

Aside from these differences, the essense of the macro—the pattern and template—is the same as the define-syntax-rule version. But we’ve added an annotation to the var pattern variable via the #:declare keyword, that constrains var to match only terms accepted by the identifier syntax class. If the macro is used with a var argument that is not an identifier, the macro raises a syntax error, and syntax-parse uses the syntax class annotation to construct a good error message.

> (andlet1 "not an id" #t #t)
eval:5.0: andlet1: expected identifier
at: "not an id"
in: (andlet1 "not an id" #t #t)

The definition above can be more compactly written by taking advantage of the function-like define-syntax location and the “colon” notation for syntax class annotations, which replaces the #:declare clause:

(define-syntax (andlet1 stx)
  (syntax-parse stx
    [(_ var:id e1 e2)
     #'(let ([var e1])
         (if var e2 #f))]))

Note that though the pattern contains var:id, the name of the pattern variable is just var, and that’s what we must use in the template.

Why don’t we annotate e1 and e2 to check that they are expressions? There is in fact a syntax class named expr, but it doesn’t actually check that a term is an expression. It is impossible in general to check whether a term is a valid expression without doing macro expansion, and we can’t invoke the macro expander until the static context of the term is known, and the static context is determined by how it is used in the macro template. We can’t tell a definition from an expression, for example. Rather, the expr syntax class merely checks that the term is not a keyword, such as #:declare. Keywords are not self-quoting in Racket, so they are not valid expressions, and we frequently want to distinguish keywords from expressions when parsing syntax.

In short, we can annotate e1 and e2 with the expr syntax class, but we should keep in mind the syntactic checking is very shallow; we do it primarily to signal our intent to use e1 and e2 as expressions.

(define-syntax (andlet1 stx)
  (syntax-parse stx
    [(_ var:id e1:expr e2:expr)
     #'(let ([var e1])
         (if var e2 #f))]))

Lesson: Use syntax-parse and syntax-class annotations on pattern variables to validate a macro’s arguments.

Exercise 19: Add syntax validation to iflet from Exercise 3 by rewriting it to use syntax-parse.

Exercise 20: Add syntax validation to my-let from Exercise 11 by rewriting it to use syntax-parse. Revisit each example misuse you discovered. Which of the misuses are now rejected due to syntax validation? Which are not?
The solution to this exercise is discussed in the next section, Context-Sensitive Validation.

2.2 Context-Sensitive Validation

In Exercise 20 you rewrote my-let using syntax-parse and added syntax class annotations to validate that the variable arguments are identifiers. Here’s the code:

(define-syntax (my-let stx)
  (syntax-parse stx
    [(_ ([var:id rhs:expr] ...) body:expr)
     #'((lambda (var ...) body) rhs ...)]))

You should have also tested your solution against the four kinds of misuses that the define-syntax-rules version didn’t catch. Let’s try them again, and see which ones are caught by the new version of my-let.

> (my-let ([1 2]) 'body) ; was `lambda: not an identifier, ...'
eval:9.0: my-let: expected identifier
  at: 1
  in: (my-let ((1 2)) (quote body))
> (my-let ([a 1] [a 2]) 'body) ; was `lambda: duplicate argument name'
eval:10:0: lambda: duplicate argument name
  at: a
  in: (lambda (a a) (quote body))
> (my-let ([#:a 1] [b 2]) 'body) ; was `arity mismatch'
eval:11.0: my-let: expected identifier
  at: #:a
  in: (my-let ((#:a 1) (b 2)) (quote body))
> (my-let ([[a 1] 2]) 'body) ; previously ran without error
eval:12.0: my-let: expected identifier
  at: (a 1)
  in: (my-let (((a 1) 2)) (quote body))

Three of the four misuses now signal an error in terms of my-let. Let’s look more closely at the one that doesn’t.

> (my-let ([a 1] [a 2]) 'body) ; was `lambda: duplicate argument name'
eval:13:0: lambda: duplicate argument name
at: a
in: (lambda (a a) (quote body))

Neither occurrence of a is wrong by itself; only the use of both of them in the same sequence of bindings is problematic. In other words, each binding variable is subject to a context-sensitive constraint—it must be distinct from any previous binding variable in the sequence. Syntax class annotations represent context-free constraints: here, that the term must be an identifier.

We can check context-sensitive constraints explicitly by inserting code between the pattern and the template.

(define-syntax (my-let stx)
  (syntax-parse stx
    [(_ ([var:id rhs:expr] ...) body:expr)
     (let loop ([vars (syntax->list #'(var ...))]
                [seens null])
       ; vars is list of variables to check
       ; seens is prefix of variables already seen
       (when (pair? vars)
         (when (for/or ([seen (in-list seens)])
                 (bound-identifier=? (car vars) seen))
           (raise-syntax-error #f
             "duplicate identifier"
             stx (car vars)))
         (loop (cdr vars) (cons (car vars) seens))))
     #'((lambda (var ...) body) rhs ...)]))

This code contains some new features. Again, I’ll mention them briefly here, and they will be explained in more detail later.

We use (syntax->list #'(var ...)) to get a list of the var identifiers.
We use bound-identifier=? to check whether the current var is equal to a previous var in the list.
When a duplicate is discovered, we call raise-syntax-error, which takes #f (nearly always; see the documentation for details), an error message, a “big term”, and a “little term”. The big term is the whole expression; its leading identifier will be used as the complaining party—thus, “my-let: duplicate identifier”. The little term is the precise location of the error—here, it is the duplicate variable.

With the new error-checking code, my-let catches the duplicate instead of passing it along to lambda to discover:

> (my-let ([a 1] [a 2]) 'body) ; was `lambda: duplicate argument name'
eval:15:0: my-let: duplicate identifier
at: a
in: (my-let ((a 1) (a 2)) (quote body))

I’ve written out the error-checking code above to give you some insight into how it is done, but there is a shorter way to write it. Racket has a function called check-duplicate-identifier that finds duplicate identifiers using bound-identifier=?. And syntax-parse offers a #:fail-when clause that replaces the call to raise-syntax-error. Here is the shorter version of the macro:

(define-syntax (my-let stx)
  (syntax-parse stx
    [(_ ([var:id rhs:expr] ...) body:expr)
     #:fail-when (check-duplicate-identifier
                  (syntax->list #'(var ...)))
                 "duplicate identifier"
     #'((lambda (var ...) body) rhs ...)]))

Recall that a syntax error contains a big term and a little term; syntax-parse knows the big term, and check-duplicate-identifier returns either #f or a duplicate identifier—the identifier is the little term.

Lesson: When writing a binding form, use bound-identifier=? or check-duplicate-identifier to check for collisions between binders.

Exercise 21: Write a macro my-let*-distinct that behaves like let* except that it requires its variables to be distinct, like let. Hint: rename the previous definition of my-let* from Recursive Macros and reuse it as a helper macro.

2.3 Validating Syntax using syntax-case

Occasionally you will read macros that do not use syntax-parse. They may have been written before syntax-parse was added to Racket, or they may be written with the want or need to avoid a dependency on syntax-parse.

The other main macro-definition form in Racket—and still the most common, as of 2014—is syntax-case. It uses the same simple pattern language as syntax-rules while allowing arbitrary code to compute the expansion result.

FIXME: fenders (bad error message)

FIXME: error checking like above

← prev up next →

	Preliminaries
1	Basic Macrology
2	Specifying and Validating Syntax
3	Solutions for Selected Exercises
4	More Topics to Cover

2.1	Basic Syntax Validation
2.2	Context-Sensitive Validation
2.3	Validating Syntax using syntax-case