### 2Probability Distributions

This section describes the distribution types and operations supported by gamble.

#### 2.1Distribution Operations

 procedure(dist? v) → boolean? v : any/c
Returns #t if v is a distribution object, #f otherwise.

 procedure v : any/c
Returns #t if v is a distribution object whose support is intrinsically integer-valued, #f otherwise.

The distribution types for which integer-dist? returns true consist of exactly the ones listed in Integer Distribution Types. A discrete distribution whose values happen to be integers is not considered an integer distribution.

 procedure v : any/c
Returns #t if v is a distribution object whose support is finite, #f otherwise.

The distribution types for which finite-dist? returns true consist of exactly the ones listed in Integer Distribution Types and Discrete Distribution Type.

 procedure v : any/c
Returns #t if v is a distribution object whose support is a real interval, #f otherwise.

The distribution types for which real-dist? returns true consist of exactly the ones listed in Real Distribution Types. A discrete distribution whose values happen to be real numbers is not considered a real distribution. A distribution whose values consist of real vectors, such as a Dirichlet distribution, is not considered real-valued.

 procedure(dist-pdf d v [log?]) → real? d : dist? v : any/c log? : any/c = #f
Returns the probability density (or mass, as appropriate) of the value v in distribution d. If log? is true, the log density (or log mass) is returned instead.

 procedure(dist-cdf d v [log? 1-p?]) → real? d : dist? v : any/c log? : any/c = #f 1-p? : any/c = #f
Returns the cumulative probability density (or mass, as appropriate) of the value v in distribution dthat is, the probability that a random variable X distributed as d satisfies (<= X v). If 1-p? is true, then the probability of (> X v) is returned instead.

If log? is true, then the log probability is returned instead of the probability.

This function applies only to integer-valued (integer-dist?) and real-valued (real-dist?) distributions.

 procedure(dist-inv-cdf d p [log? 1-p?]) → any/c d : dist? p : (real-in 0 1) log? : any/c = #f 1-p? : any/c = #f
Returns the inverse of the CDF of d at p. If log? is true, then the inverse at (exp p) is used instead. If 1-p? is true, then the inverse at (- 1 p) is used instead.

 procedure(dist-sample d) → any/c d : dist?
Produces a sample distributed according to d. This is an unmanaged stochastic effect, like calling Racket’s random function.

Do not use dist-sample within a sampler/solver; use sample instead.

 procedure(in-dist d) → sequence? d : finite-dist?
Returns a sequence where each element consists of two values: a value from the support of the distribution and its probability mass. The weights are normalized to 1, even if d is unnormalized.

Example:

 > (for ([(v p) (in-dist (bernoulli-dist 1/3))]) (printf "Result ~s has probability ~s.\n" v p))
 Result 0 has probability 2/3. Result 1 has probability 1/3.

 procedure(in-measure d) → sequence? d : finite-dist?
Like in-dist, but without normalizing the weights.

 procedure(dist-mean d) → (or/c any/c +nan.0 #f) d : dist?
 procedure(dist-median d) → (or/c any/c +nan.0 #f) d : dist?
 procedure(dist-variance d) → (or/c any/c +nan.0 #f) d : dist?
Returns the mean, median, or variance of the distribution d, respectively.

If the distribution is integer-valued or real-valued, the statistic is a real number. Other kinds of distribution may have other types for these statistics.

A return value of +nan.0 indicates that the statistic is known to be undefined.

A return value of #f indicates that the statistic is unknown; it may not be defined, it may be infinite, or the calculation might not be implemented.

 procedure(dist-modes d) → (or/c list? #f) d : dist?
Returns the modes of the distribution d.

A return value of '() indicates that the distribution has no mode. A return value of #f indicates that the statistic is unknown; it may not be defined, it may be infinite, or the calculation might not be implemented.

 procedure(dist-energy d x) → real? d : dist? x : any/c
Returns the value of the “energy” function of d evaluated at x. Minimizing energy is equivalent to maximizing likelihood.

Equivalent to (- (log (dist-pdf d x))).

Examples:

> (define N (normal-dist 0 1))
> (dist-energy N 5)

13.418938533204672

> (dist-energy N 1)

1.4189385332046727

> (dist-energy N 0)

0.9189385332046728

> (dist-energy N -1)

1.4189385332046727

 procedure(dist-Denergy d x [dx/dt dparam/dt ...]) → real? d : dist? x : real? dx/dt : real? = 1 dparam/dt : real? = 0
Returns the value at x of the derivative of d’s energy function.

Examples:

> (define N (normal-dist 0 1))
> (dist-Denergy N 5)

5.0

> (dist-Denergy N 1)

1.0

> (dist-Denergy N 0)

-0.0

> (dist-Denergy N -1)

-1.0

The parameters of d and the position x are considered functions of a hypothetical variable t, and the derivative is taken with respect to t. Thus by varying dx/dt and the dparam/dts, mixtures of the partial derivatives of energy with respect to the distribution’s parameters can be recovered.

Examples:

> (define N (normal-dist 0 1))
> (dist-Denergy N 5 1 0 0) ; ∂energy/∂x

5.0

> (dist-Denergy N 5 0 1 0) ; ∂energy/∂μ; μ is 1st param of normal-dist

-5.0

> (dist-Denergy N 5 0 0 1) ; ∂energy/∂σ; σ is 2nd param of normal-dist

-24.0

If the derivative is not defined (such as for non-continuous distributions) or not implemented for distribution d, an exception is raised.

 procedure(dist-update-prior prior dist-pattern data) → (or/c dist? #f) prior : dist? dist-pattern : any/c data : vector?
Returns a distribution representing a closed-form solution to Bayes’ Law applied to a distribution matching dist-pattern whose parameter is distributed according to prior and incorporating data as evidence. The result is a distribution of the same type as prior. If prior is not a (known, implemented) conjugate prior for dist-pattern, #f is returned.

The dist-pattern is an S-expression consisting of a distribution type name (a symbol) and the distribution parameters, where the parameter distributed according to prior is indicated by the symbol '_.

For example, if the prior of a Bernoulli distribution’s success probability is (beta-dist 1 1), and three successes are observed, the posterior can be calculated with:
 > (dist-update-prior (beta-dist 1 1) '(bernoulli-dist _) (vector 1 1 1)) (beta-dist 4.0 1.0)
If one success and two failures were observed, the posterior is
 > (dist-update-prior (beta-dist 1 1) '(bernoulli-dist _) (vector 1 0 0)) (beta-dist 2.0 1.0)
Here is another example where the mean of a normal distribution is also normally distributed, but the standard deviation is fixed.
 > (dist-update-prior (normal-dist 10 1) '(normal-dist _ 1) (vector 9)) (normal-dist 9.5 0.7071067811865476) > (dist-update-prior (normal-dist 10 1) '(normal-dist _ 0.5) (vector 9)) (normal-dist 9.2 0.4472135954999579)

#### 2.2Integer Distribution Types

 struct(struct bernoulli-dist (p)) p : (real-in 0 1)
Represents a Bernoulli distribution with success probability p.

 struct(struct binomial-dist (n p)) n : exact-positive-integer? p : (real-in 0 1)
Represents a binomial distribution of n trials each with success probability p.

 struct(struct categorical-dist (weights)) weights : (vectorof (>=/c 0))
Represents a categorical distribution (sometimes called a discrete distribution, multinomial distribution, or multinoulli distribution).

 struct(struct geometric-dist (p)) p : (real-in 0 1)

 struct(struct poisson-dist (mean)) mean : (>/c 0)
Represents a Poisson distribution with mean mean.

#### 2.3Real Distribution Types

 struct(struct beta-dist (a b)) a : (>=/c 0) b : (>=/c 0)
Represents a beta distribution with shape a and scale b.

 struct(struct cauchy-dist (mode scale)) mode : real? scale : (>/c 0)
Represents a Cauchy distribution with mode mode and scale scale.

 struct(struct exponential-dist (mean)) mean : (>/c 0)
Represents an exponential distribution with mean mean.

Note: A common alternative parameterization uses the rate λ = (/ mean).

 struct(struct gamma-dist (shape scale)) shape : (>/c 0) scale : (>/c 0)
Represents a gamma distribution with shape (k) shape and scale (θ) scale.

Note: A common alternative parameterization uses α = shape and rate β = (/ scale).

 struct(struct inverse-gamma-dist (shape scale)) shape : (>/c 0) scale : (>/c 0)
Represents an inverse-gamma distribution, the reciprocals of whose values are distributed according to (gamma-dist shape scale).

 struct(struct logistic-dist (mean scale)) mean : real? scale : (>/c 0)
Represents a logistic distribution with mean mean and scale scale.

 struct(struct normal-dist (mean stddev)) mean : real? stddev : (>/c 0)
Represents a normal (Gaussian) distribution with mean (μ) mean and standard deviation (σ) stddev.

Note: A common alternative parameterization uses the variance σ2.

 struct(struct pareto-dist (scale shape)) scale : (>/c 0) shape : (>/c 0)
Represents a Pareto distribution.

 struct(struct t-dist (degrees mode scale)) degrees : (>/c 0) mode : real? scale : (>/c 0)

 struct(struct uniform-dist (min max)) min : real? max : real?
Represents a uniform distribution with lower bound min and upper bound max.

#### 2.4Real Distribution Transformers

The following constructors take and produce real-valued distributions.

 struct(struct affine-distx (dist a b)) dist : real-dist? a : real? b : real?
Represents the distribution of (+ (* a t) b) where t is distributed according to dist.

 struct(struct clip-distx (dist a b)) dist : real-dist? a : real? b : real?
Clips dist to the closed interval [a, b].

If the interval is small, the clipped dist is sampled using the dist-inv-cdf method of dist; otherwise, rejection sampling is used.

 struct(struct exp-distx (dist)) dist : real-dist?
Represents the distribution of (exp t) where t is distributed according to dist.

 struct(struct log-distx (dist)) dist : real-dist?
Represents the distribution of (log t) where t is distributed according to dist, whose support should include only the non-negative reals.

#### 2.5Vector Distribution Types

 struct(struct dirichlet-dist (alpha)) alpha : (vectorof (>/c 0))
Represents a Dirichlet distribution. The support consists of vectors of the same length as alpha whose elements are nonnegative reals summing to 1.

 struct(struct multinomial-dist (n weights)) n : exact-nonnegative-integer? weights : (vectorof (>=/c 0))
Represents a multinomial distribution. The support consists of vectors of the same length as weights representing counts of n iterated samples from the corresponding categorical distribution with weights for weights.

 struct n : exact-nonnegative-integer?
Returns a uniform distribution over permutations of n elements, where a permutation is represented by a vector containing each of the integers from 0 to (sub1 n) exactly once.

#### 2.6Multivariate Distribution Types

 struct(struct multi-normal-dist (mean cov)) mean : col-matrix? cov : square-matrix?
Represents a multi-variate normal (Gaussian) distribution. The covariance matrix cov must be a square, symmetric, positive-definite matrix with as many rows as mean.

The support consists of column matrices having the same shape as mean.

 struct(struct wishart-dist (n V)) n : real? V : square-matrix?
Represents a Wishart distribution with n degrees of freedom and scale matrix V. The scale matrix V must be a square, symmetric, positive-definite matrix.

The support consists of square, symmetric, positive-definite matrices having the same shape as V.

 struct(struct inverse-wishart-dist (n Vinv)) n : real? Vinv : square-matrix?
Represents a Inverse-Wishart distribution with n degrees of freedom and scale matrix Vinv. The scale matrix Vinv must be a square, symmetric, positive-definite matrix.

If X is distributed according to (wishart-dist n V), then (matrix-inverse X) is distributed according to (inverse-wishart-dist n (matrix-inverse V)).

The support consists of square, symmetric, positive-definite matrices having the same shape as Vinv.

#### 2.7Discrete Distribution Type

A discrete distribution is a distribution whose support is a finite collection of arbitrary Racket values. Note: this library calls categorical-dist a distribution whose support consists of the integers {0, 1, ..., N}.

The elements of a discrete distribution are distinguished using equal?. The constructors for discrete distributions detect and coalesce duplicates.

 procedure v : any/c
Returns #t if v is a discrete distribution, #f otherwise.

syntax

 (discrete-dist maybe-normalize [value-expr weight-expr] ...)

maybe-normalize =
| #:normalize? normalize?-expr

 weight-expr : (>=/c 0)
Produces a discrete distribution whose values are the value-exprs and whose probability masses are the corresponding weight-exprs.

Normalization affects printing and discrete-dist-weights, but not dist-pdf.

Example:

 > (discrete-dist ['apple 1/2] ['orange 1/3] ['pear 1/6]) (discrete-dist ['apple 1/2] ['orange 1/3] ['pear 1/6])

procedure

 (make-discrete-dist weighted-values [ #:normalize? normalize?]) → discrete-dist?
weighted-values : dict?
normalize? : any/c = #t
Produces a discrete distribution from the dictionary weighted-values that maps values to weights.

Example:

 > (make-discrete-dist '((apple . 1/2) (orange . 1/3) (pear . 1/6))) (discrete-dist ['apple 1/2] ['orange 1/3] ['pear 1/6])

procedure

 (make-discrete-dist* values [ weights #:normalize? normalize?]) → discrete-dist?
values : vector?
weights : (vectorof (>=/c 0)) = (vector 1 ...)
normalize? : any/c = #t
Produces a discrete distribution with the values of values and weights of weights. The two vectors must have equal lengths.

Example:

 > (make-discrete-dist* (vector 'apple 'orange 'pear) (vector 1/2 1/3 1/6))

(discrete-dist ['apple 1/2] ['orange 1/3] ['pear 1/6])

 procedure d : discrete-dist?
Normalizes a discrete distribution. If d is already normalized, the function may return d.

 procedure d : discrete-dist?
 procedure d : discrete-dist?
Returns the values and weights of d, respectively.

 syntax(discrete-measure [value-expr weight-expr] ...)
Equivalent to (discrete-dist #:normalize? #f [value-expr weight-expr] ...).

#### 2.8Finite Distributions as a Monad

The following operations, despite the dist- in the names, may produce unnormalized discrete distributions.

 procedure v : any/c
Returns a distribution with all probability mass concentrated on v.

 procedure d : finite-dist? f : (-> any/c finite-dist?)
Given a distribution d for random variable A and a probability kernel f for B given A, forms the joint probability for (A,B), then marginalizes out A, returning the marginal distribution for B.

Examples:

 > (define (ground-wet raining) (case raining [(0) (bernoulli-dist 9/10)] [(1) (bernoulli-dist 1/5)]))
> (define raining-dist (bernoulli-dist 1/5))
; marginal probability of Ground Wet
> (dist-bind raining-dist ground-wet)

(discrete-dist [0 6/25] [1 19/25])

 procedure d : finite-dist? f : (-> any/c finite-dist?)
Like dist-bind, but omits the marginalization step, returning the joint distribution.

Equivalent to (dist-bind d (λ (v1) (dist-fmap (f v1) (λ (v2) (list v1 v2))))).

Examples:

 ; joint distribution of (Raining, Ground Wet) > (dist-bindx raining-dist ground-wet) (discrete-dist ['(0 0) 2/25] ['(0 1) 18/25] ['(1 0) 4/25] ['(1 1) 1/25])

 procedure d : finite-dist? f : (-> any/c any/c)
Equivalent to (dist-bind d (compose dist-unit f)).

 procedure(dist-filter d pred) → finite-dist? d : finite-dist? pred : (-> any/c boolean?)
Returns a distribution like d but whose support is narrowed to values accepted by the predicate pred.

Examples:

> (dist-filter (binomial-dist 10 1/2) even?)
 (discrete-measure [0 0.0009765625] [2 0.0439453125] [4 0.205078125] [6 0.205078125] [8 0.0439453125] [10 0.0009765625])
> (dist-filter (binomial-dist 10 1/2) negative?)

(discrete-measure)

#### 2.9Defining New Probability Distributions

syntax

 (define-dist-type dist-name ([field-id field-ctc] ...) #:pdf pdf-fun #:sample sample-fun dist-option ...)

 dist-option = #:cdf cdf-fun | #:inv-cdf inv-cdf-fun | #:guard guard-fun | #:extra [struct-option ...] | #:no-provide

 field-ctc : contract?
 pdf-fun : (field-ctc ... any/c boolean? . -> . real?)
 cdf-fun : (field-ctc ... any/c boolean? boolean? . -> . real?)
 inv-cdf-fun : (field-ctc ... real? boolean? boolean? . -> . any/c)
 sample-fun : (field-ctc ... . -> . any/c)
Defines dist-name as a new distribution type. In particular, dist-name is a struct implementing the internal generic interface representing distributions.

At a minimum, the new distribution type must supply a probability density/mass function (pdf-fun) and a sampling function (sample-fun). The type may also supply a cumulative probability function (cdf-fun) and inverse (inv-cdf-fun). The signatures of the functions are like dist-pdf, dist-sample, etc, but instead of the distribution itself, they accept the fields of the distribution as the initial arguments.

The guard-fun is a struct guard function; see make-struct-type for details. Other struct options may be passed via the #:extra option. Distribution types are automatically transparent.

By default, dist-name, dist-name?, and the synthesized accessors are provided with the field-ctc contracts. The #:no-provide option disables automatic providing, in which case the field-ctcs are ignored.