Distributions

mlpack has support for a number of different distributions, each supporting the same API. These can be used with, for instance, the HMM class.

DiscreteDistribution: multidimensional categorical distribution (generalized Bernoulli distribution)
GaussianDistribution: multidimensional Gaussian distribution
DiagonalGaussianDistribution: multidimensional Gaussian distribution with diagonal covariance
GammaDistribution: multidimensional Gamma distribution, includes exponential, Chi-squared, and Erlang distributions
LaplaceDistribution: multidimensional Laplace (double exponential) distribution
RegressionDistribution: multidimensional Gaussian distribution on the errors of a linear regression model

`DiscreteDistribution`

DiscreteDistribution represents a multidimensional categorical distribution (or generalized Bernoulli distribution) where integer-valued vectors (e.g. [0, 3, 4]) are associated with specific probabilities in each dimension.

Example: a 3-dimensional DiscreteDistribution will have a specific probability value associated with each integer value in each dimension. So, for the vector [0, 3, 4], P(0) in dimension 0 could be, e.g., 0.3, P(3) in dimension 1 could be, e.g., 0.4, and P(4) in dimension 2 could be, e.g., 0.6. Then, P([0, 3, 4]) would be 0.3 * 0.4 * 0.6 = 0.072.

Constructors

d = DiscreteDistribution(numObservations)
- Create a one-dimensional discrete distribution with numObservations different observations in the one and only dimension. numObservations is of type size_t.
d = DiscreteDistribution(numObservationsVec)
- Create a multidimensional discrete distribution with numObservationsVec.n_elem dimensions and numObservationsVec[i] different observations in dimension i.
- numObservationsVec is of type arma::Col<size_t>.
d = DiscreteDistribution(probabilities)
- Create a multidimensional discrete distribution with the given probabilities.
- probabilities should have type std::vector<arma::vec>, and probabilities.size() should be equal to the dimensionality of the distribution.
- probabilities[i] is a vector such that probabilities[i][j] contains the probability of j in dimension i.

Access and modify properties of distribution

d.Dimensionality() returns a size_t indicating the number of dimensions in the multidimensional discrete distribution.
d.Probabilities(i) returns an arma::vec& containing the probabilities of each observation in dimension i.
- d.Probabilities(i)[j] is the probability of j in dimension i.
- This can be used to modify probabilities: d.Probabilities(0)[1] = 0.7 sets the probability of observing the value 1 in dimension 0 to 0.7.
- Note: when setting probabilities manually, be sure that the sum of probabilities in a dimension is 1!
A DiscreteDistribution can be serialized with data::Save() and data::Load().

Compute probabilities of points

d.Probability(observation) returns the probability of the given observation as a double.
- observation should be an arma::vec of size d.Dimensionality().
- observation[i] should take integer values between 0 and d.Probabilities(i).n_elem - 1.
d.Probability(observations, probabilities) computes the probabilities of many observations.
- observations should be an arma::mat with number of rows equal to d.Dimensionality(); observations.n_cols is the number of observations.
- probabilities will be set to size observations.n_cols.
- probabilities[i] will be set to d.Probability(observations.col(i)).
d.LogProbability(observation) returns the log-probability of the given observation as a double.
d.LogProbability(observations, probabilities) computes the log-probabilities of many observations.

Sample from the distribution

d.Random() returns an arma::vec with a random sample from the multidimensional discrete distribution.

Fit the distribution to observations

d.Train(observations)
- Fit the distribution to the given observations.
- observations should be an arma::mat with number of rows equal to d.Dimensionality(); observations.n_cols is the number of observations.
- observations(j, i) should be an integer value between 0 and the number of observations for dimension i.
d.Train(observations, observationProbabilities)
- Fit the distribution to the given observations, as above, but also provide probabilities that each observation is from this distribution.
- observationProbabilities should be an arma::vec of length observations.n_cols.
- observationProbabilities[i] should be equal to the probability that observations.col(i) is from d.

Example usage

// Create a single-dimension Bernoulli distribution: P([0]) = 0.3, P([1]) = 0.7.
mlpack::DiscreteDistribution bernoulli(2);
bernoulli.Probabilities(0)[0] = 0.3;
bernoulli.Probabilities(0)[1] = 0.7;

const double p1 = bernoulli.Probability(arma::vec("0")); // p1 = 0.3.
const double p2 = bernoulli.Probability(arma::vec("1")); // p2 = 0.7.

// Create a 3-dimensional discrete distribution by specifying the probabilities
// manually.
arma::vec probDim0 = arma::vec("0.1 0.3 0.5 0.1"); // 4 possible values.
arma::vec probDim1 = arma::vec("0.7 0.3");         // 2 possible values.
arma::vec probDim2 = arma::vec("0.4 0.4 0.2");     // 3 possible values.
std::vector<arma::vec> probs { probDim0, probDim1, probDim2 };
mlpack::DiscreteDistribution d(probs);

arma::vec obs("2 0 1");
const double p3 = d.Probability(obs); // p3 = 0.5 * 0.7 * 0.4 = 0.14.

// Estimate a 10-dimensional discrete distribution.
// Each dimension takes values between 0 and 9.
arma::mat observations = arma::randi<arma::mat>(10, 1000,
    arma::distr_param(0, 9));

// Create a distribution with 10 observations in each of the 10 dimensions.
mlpack::DiscreteDistribution d2(
    arma::Col<size_t>("10 10 10 10 10 10 10 10 10 10"));
d2.Train(observations);

// Compute the probabilities of each point.
arma::vec probabilities;
d2.Probability(observations, probabilities);
std::cout << "Average probability: " << arma::mean(probabilities) << "."
    << std::endl;

Using different element types

The DiscreteDistribution class takes two template parameters:

DiscreteDistribution<MatType, ObsMatType>

MatType represents the matrix type used to represent internal parameters (e.g. probabilities of each observation).
ObsMatType represents the matrix type used to represent observations.
By default:
- MatType is arma::mat, but any dense matrix type matching the Armadillo API that holds floating-point numbers can be used (e.g. arma::fmat).
- ObsMatType is MatType, but any matrix type matching the Armadillo API can be used (e.g. arma::fmat, arma::imat, etc.).
When using custom MatType and ObsMatType parameters, several method signatures will change:
- DiscreteDistributions(probabilities) will expect probabilities to be std::vector, where VecTypeis the column vector type associated withMatType(e.g.arma::fvecforarma::fmat`).
- Probability(observation) and LogProbability(observation) will expect observation to be an ObsVecType, where ObsVecType is the column vector type associated with ObsMatType, and will return a probability with type equivalent to the element type of MatType.
- Probability(observations, probabilities) and LogProbability(observations, probabilities) will expect observations to be of type ObsMatType and probabilities to be of type VecType.
- Random() will return an ObsVecType.
- Train(observations) and Train(observations, probabilities) will expect observations to be of type ObsMatType and probabilities to be of type VecType.
- Probabilities(dim) will return a VecType.

The code below uses a DiscreteDistribution built on 32-bit floating point numbers.

// Create a distribution with 10 observations in each of 3 dimensions.
mlpack::DiscreteDistribution<arma::fmat> d(arma::Col<size_t>("10 10 10"));

// Train the distribution on random data.
arma::fmat observations =
    arma::randi<arma::fmat>(3, 100, arma::distr_param(0, 9));
d.Train(observations);

// Compute and print the probability of [8, 6, 7].
const float p = d.Probability(arma::fvec("8 6 7"));
std::cout << "Probability of [8, 6, 7]: " << p << "." << std::endl;

The code below uses a DiscreteDistribution that internally uses float to hold probabilities, but accepts unsigned ints as observations.

// Create a distribution with 10 observations in each of 3 dimensions.
mlpack::DiscreteDistribution<arma::fmat, arma::umat> d(
    arma::Col<size_t>("10 10 10"));

// Train the distribution on random data.  Note that the observation type is a
// matrix of unsigned ints (arma::umat).
arma::umat observations =
    arma::randi<arma::umat>(3, 100, arma::distr_param(0, 9));
d.Train(observations);

// Compute and print the probability of [8, 6, 7].  Note that the input vector
// is a vector of unsigned ints (arma::uvec), but the returned probability is a
// float because MatType is set to arma::fmat.
const float p = d.Probability(arma::uvec("8 6 7"));
std::cout << "Probability of [8, 6, 7]: " << p << "." << std::endl;

// Print the probability vector for dimension 0.
std::cout << "Probabilities for observations in dimension 0: "
    << d.Probabilities(0).t() << std::endl;

`GaussianDistribution`

GaussianDistribution is a standard multivariate Gaussian distribution with parameterized mean and covariance. (For a Gaussian distribution with a diagonal covariance, see DiagonalGaussianDistribution.)

Constructors

g = GaussianDistribution(dimensionality)
- Create the distribution with the given dimensionality.
- The distribution will have a zero mean and unit diagonal covariance matrix.
g = GaussianDistribution(mean, covariance)
- Create the distribution with the given mean and covariance.
- mean is of type arma::vec and should have length equal to the dimensionality of the distribution.
- covariance is of type arma::mat, and should be symmetric and square, with rows and columns equal to the dimensionality of the distribution.

Access and modify properties of distribution

g.Dimensionality() returns the dimensionality of the distribution as a size_t.
g.Mean() returns an arma::vec& holding the mean of the distribution. This can be modified.
g.Covariance() returns a const arma::mat& holding the covariance of the distribution. To set a new covariance, use g.Covariance(newCov) or g.Covariance(std::move(newCov)).
g.InvCov() returns a const arma::mat& holding the precomputed inverse of the covariance.
g.LogDetCov() returns a double holding the log-determinant of the covariance.
A GaussianDistribution can be serialized with data::Save() and data::Load().

Compute probabilities of points

g.Probability(observation) returns the probability of the given observation as a double.
- observation should be an arma::vec of size g.Dimensionality().
g.Probability(observations, probabilities) computes the probabilities of many observations.
- observations should be an arma::mat with number of rows equal to d.Dimensionality(); observations.n_cols is the number of observations.
- probabilities will be set to size observations.n_cols.
- probabilities[i] will be set to g.Probability(observations.col(i)).
g.LogProbability(observation) returns the log-probability of the given observation as a double.
g.LogProbability(observations, probabilities) computes the log-probabilities of many observations.

Sample from the distribution

g.Random() returns an arma::vec with a random sample from the multidimensional Gaussian distribution.

Fit the distribution to observations

g.Train(observations)
- Fit the distribution to the given observations.
- observations should be an arma::mat with number of rows equal to g.Dimensionality(); observations.n_cols is the number of observations.
g.Train(observations, observationProbabilities)
- Fit the distribution to the given observations, as above, but also provide probabilities that each observation is from this distribution.
- observationProbabilities should be an arma::vec of length observations.n_cols.
- observationProbabilities[i] should be equal to the probability that observations.col(i) is from g.

Example usage

// Create a Gaussian distribution in 3 dimensions with zero mean and unit
// covariance.
mlpack::GaussianDistribution g(3);

// Compute the probability of the point [0, 0.5, 0.25].
const double p = g.Probability(arma::vec("0 0.5 0.25"));

// Modify the mean in dimension 0.
g.Mean()[0] = 0.5;

// Set a random covariance.
arma::mat newCov(3, 3, arma::fill::randu);
newCov *= newCov.t(); // Ensure covariance is positive semidefinite.
g.Covariance(std::move(newCov)); // Set new covariance.

// Compute the probability of the same point [0, 0.5, 0.25].
const double p2 = g.Probability(arma::vec("0 0.5 0.25"));

// Create a Gaussian distribution that is estimated from random samples in 50
// dimensions.
arma::mat samples(50, 10000, arma::fill::randn); // Normally distributed.

mlpack::GaussianDistribution g2(50);
g2.Train(samples);

// Compute the probability of all of the samples.
arma::vec probabilities;
g2.Probability(samples, probabilities);

std::cout << "Average probability is: " << arma::mean(probabilities) << "."
    << std::endl;

Using different element types

The GaussianDistribution class takes one template parameter:

GaussianDistribution<MatType>

MatType represents the matrix type used to represent observations.
By default, MatType is arma::mat, but any matrix type matching the Armadillo API can be used (e.g. arma::fmat).
When MatType is set to anything other than arma::mat, all arguments are adapted accordingly:
- arma::mat arguments will instead be MatType.
- arma::vec arguments will instead be the corresponding column vector type associated with MatType.
- double arguments will instead be the element type of MatType.

The code below uses a Gaussian distribution to make predictions with 32-bit floating point numbers.

// Create a 3-dimensional 32-bit floating point Gaussian distribution with
// random mean and unit covariance.
mlpack::GaussianDistribution<arma::fmat> g(3);
g.Mean().randu();

// Compute the probability of the point [0.2, 0.3, 0.4].
const float p = g.Probability(arma::fvec("0.2 0.3 0.4"));

std::cout << "Probability of (0.2, 0.3, 0.4): " << p << "." << std::endl;

`DiagonalGaussianDistribution`

DiagonalGaussianDistribution is a standard multiviate Gaussian distribution with parameterized mean and diagonal covariance. (For a full-covariance Gaussian distribution, see GaussianDistribution.)

Constructors

d = DiagonalGaussianDistribution(dimensionality)
- Create the distribution with the given dimensionality.
- The distribution will have a zero mean and unit diagonal covariance matrix.
d = DiagonalGaussianDistribution(mean, covariance)
- Create the distribution with the given mean and covariance.
- mean is of type arma::vec and should have length equal to the dimensionality of the distribution.
- covariance is of type arma::vec, and should have length equal to the dimensionality of the distribution. Its elements represent the diagonal of the covariance matrix.

Access and modify properties of distribution

d.Dimensionality() returns the dimensionality of the distribution as a size_t.
d.Mean() returns an arma::vec& holding the mean of the distribution. This can be modified.
d.Covariance() returns a const arma::vec& holding the covariance of the distribution. To set a new covariance, use d.Covariance(newCov) or d.Covariance(std::move(newCov)), where newCov is the new diagonal of the covariance matrix.
A DiagonalGaussianDistribution can be serialized with data::Save() and data::Load().

Compute probabilities of points

d.Probability(observation) returns the probability of the given observation as a double.
- observation should be an arma::vec of size d.Dimensionality().
d.Probability(observations, probabilities) computes the probabilities of many observations.
- observations should be an arma::mat with number of rows equal to d.Dimensionality(); observations.n_cols is the number of observations.
- probabilities will be set to size observations.n_cols.
- probabilities[i] will be set to d.Probability(observations.col(i)).
d.LogProbability(observation) returns the log-probability of the given observation as a double.
d.LogProbability(observations, probabilities) computes the log-probabilities of many observations.

Sample from the distribution

d.Random() returns an arma::vec with a random sample from the multidimensional diagonal Gaussian distribution.

Fit the distribution to observations

d.Train(observations)
- Fit the distribution to the given observations.
- observations should be an arma::mat with number of rows equal to d.Dimensionality(); observations.n_cols is the number of observations.
g.Train(observations, observationProbabilities)
- Fit the distribution to the given observations, as above, but also provide probabilities that each observation is from this distribution.
- observationProbabilities should be an arma::vec of length observations.n_cols.
- observationProbabilities[i] should be equal to the probability that observations.col(i) is from d.

Example usage

// Create a diagonal Gaussian distribution in 3 dimensions with zero mean and
// unit covariance.
mlpack::DiagonalGaussianDistribution d(3);

// Compute the probability of the point [0, 0.5, 0.25].
const double p = d.Probability(arma::vec("0 0.5 0.25"));

// Modify the mean in dimension 0.
d.Mean()[0] = 0.5;

// Set the covariance to a random diagonal.
arma::vec newCovDiag(3, arma::fill::randu);
d.Covariance(std::move(newCovDiag)); // Set new covariance.

// Compute the probability of the same point [0, 0.5, 0.25].
const double p2 = d.Probability(arma::vec("0 0.5 0.25"));

// Create a diagonal Gaussian distribution that is estimated from random samples
// in 50 dimensions.
arma::mat samples(50, 10000, arma::fill::randn); // Normally distributed.

mlpack::DiagonalGaussianDistribution d2(50);
d2.Train(samples);

// Compute the probability of all of the samples.
arma::vec probabilities;
d2.Probability(samples, probabilities);

std::cout << "Average probability is: " << arma::mean(probabilities) << "."
    << std::endl;

Using different element types

The DiagonalGaussianDistribution class takes one template parameter:

DiagonalGaussianDistribution<MatType>

MatType represents the matrix type used to represent observations.
By default, MatType is arma::mat, but any matrix type matching the Armadillo API can be used (e.g. arma::fmat).
When MatType is set to anything other than arma::mat, all arguments are adapted accordingly:
- arma::mat arguments will instead be MatType.
- arma::vec arguments will instead be the corresponding column vector type associated with MatType.
- double arguments will instead be the element type of MatType.

The code below uses a Gaussian distribution to make predictions with 32-bit floating point numbers.

// Create a 3-dimensional 32-bit floating point Gaussian distribution with
// random mean and covariance.
mlpack::DiagonalGaussianDistribution<arma::fmat> g(
        arma::randu<arma::fvec>(3), arma::randu<arma::fvec>(3));

// Compute the probability of the point [0.2, 0.3, 0.4].
const float p = g.Probability(arma::fvec("0.2 0.3 0.4"));

std::cout << "Probability of (0.2, 0.3, 0.4): " << p << "." << std::endl;

`GammaDistribution`

GammaDistribution is a multivariate Gamma distribution with two parameters for shape (alpha) and inverse scale (beta). Certain settings of these parameters yield the exponential distribution, Chi-squared distribution, and Erlang distribution. This family of distributions is commonly used in Bayesian statistics. See more on Wikipedia.

Constructors

g = GammaDistribution(dimensionality)
- Create the distribution with the given dimensionality.
- The distribution will have alpha and beta parameters in each dimension set to 0.
g = GammaDistribution(alphas, betas)
- Create the distribution with the given parameters.
- alphas and betas are of type arma::vec and should have length equal to the dimensionality of the distribution.
- alphas should hold the desired shape parameters in each dimension.
- betas should hold the desired inverse scale parameters in each dimension.
g = GammaDistribution(data, tol=1e-8)
- Create the distribution by fitting to the given data.
- tol specifies the convergence tolerance for the fitting procedure.
- Using this constructor is equivalent to calling g.Train(data, tol) after initializing a GammaDistribution.

Access and modify properties of distribution

g.Dimensionality() returns the dimensionality of the distribution.
g.Alpha(i) returns a double representing the shape parameter for dimension i. g.Alpha(i) = a will set the i'th dimension's shape parameter to a.
g.Beta(i) returns a double representing the inverse scale parameter for dimension i. g.Beta(i) = b will set the i'th dimension's inverse scale parameter to b.
A GammaDistribution can be serialized with data::Save() and data::Load().

Compute probabilities of points

g.Probability(observation) returns the probability of the given observation as a double.
- observation should be an arma::vec of size g.Dimensionality().
g.Probability(observations, probabilities) computes the probabilities of many observations.
- observations should be an arma::mat with number of rows equal to g.Dimensionality(); observations.n_cols is the number of observations.
- probabilities will be set to size observations.n_cols.
- probabilities[i] will be set to g.Probability(observations.col(i)).
g.LogProbability(observation) returns the log-probability of the given observation as a double.
g.LogProbability(observations, probabilities) computes the log-probabilities of many observations.

Sample points from the distribution

g.Random() returns an arma::vec with a random sample from the Gamma distribution.

Fit the distribution to observations

g.Train(observations)
- Fit the distribution to the given observations.
- observations should be an arma::mat with number of rows equal to g.Dimensionality(); observations.n_cols is the number of observations.
g.Train(observations, observationProbabilities)
- Fit the distribution to the given observations, as above, but also provide probabilities that each observation is from this distribution.
- observationProbabilities should be an arma::vec of length observations.n_cols.
- observationProbabilities[i] should be equal to the probability that observations.col(i) is from g.
The algorithm used for fitting the distribution is described in the paper Estimating a Gamma Distribution.

Example usage

// Create a Gamma distribution in 3 dimensions with ones for the alpha (shape)
// parameters and random beta (inverse scale) parameters.
mlpack::GammaDistribution g(arma::ones<arma::vec>(3) /* shape */,
                            arma::randu<arma::vec>(3) /* scale */);

// Compute the probability and log-probability of the point [0, 0.5, 0.25].
const double p = g.Probability(arma::vec("0 0.5 0.25"));
const double lp = g.LogProbability(arma::vec("0 0.5 0.25"));

std::cout << "Probability of [0 0.5 0.25]:     " << p << "." << std::endl;
std::cout << "Log-probability of [0 0.5 0.25]: " << lp << "." << std::endl;

// Modify the scale and inverse shape parameters in dimension 0.
g.Alpha(0) = 0.5;
g.Beta(0) = 3.0;

// Compute the probability of the same point [0, 0.5, 0.25].
const double p2 = g.Probability(arma::vec("0 0.5 0.25"));
const double lp2 = g.LogProbability(arma::vec("0 0.5 0.25"));

std::cout << "After parameter changes:" << std::endl;
std::cout << "Probability of [0 0.5 0.25]:     " << p << "." << std::endl;
std::cout << "Log-probability of [0 0.5 0.25]: " << lp << "." << std::endl;

// Create a Gamma distribution that is estimated from random samples in 5
// dimensions.  Note that the samples here are uniformly distributed---so a
// Gamma distribution fit will not be a good one!
arma::mat samples(5, 1000, arma::fill::randu);
samples += 2.0; // Shift samples away from zero.

mlpack::GammaDistribution g2(samples, 1e-3 /* tolerance for fitting */);

// Compute the probability of all of the samples.
arma::vec probabilities;
g2.Probability(samples, probabilities);

std::cout << "Average probability is: " << arma::mean(probabilities) << "."
    << std::endl;

Using different element types

The GammaDistribution class takes one template parameter:

GammaDistribution<MatType>

MatType represents the matrix type used to represent observations.
By default, MatType is arma::mat, but any matrix type matching the Armadillo API can be used (e.g. arma::fmat).
When MatType is set to anything other than arma::mat, all arguments are adapted accordingly:
- arma::mat arguments will instead be MatType
- arma::vec arguments will instead be the corresponding column vector type associated with MatType
- double arguments will instead be the element type of MatType
- If the element type is float, the default tolerance (tol) for Train() is 1e-4

The code below uses a Gamma distribution to make predictions with 32-bit floating point numbers.

// Create a 3-dimensional 32-bit floating point Laplace distribution with
// ones for the shape parameter and random scale parameters.
mlpack::GammaDistribution<arma::fmat> g(arma::ones<arma::fvec>(3) /* shape */,
                                        arma::randu<arma::fvec>(3) /* scale */);

// Compute the probability of the point [0.2, 0.3, 0.4].
const float p = g.Probability(arma::fvec("0.2 0.3 0.4"));

std::cout << "Probability of (0.2, 0.3, 0.4): " << p << "." << std::endl;

`LaplaceDistribution`

LaplaceDistribution is a multivariate Laplace distribution parameterized by a mean vector and a single scale value. The Laplace distribution is sometimes also called the double exponential distribution. See more on Wikipedia.

Constructors

l = LaplaceDistribution(dimensionality, scale=1.0)
- Create the distribution with the given dimensionality.
- The distribution will have mean zero and the given scale.
- scale must be greater than 0.
l = LaplaceDistribution(mean, scale)
- Create the distribution with the given parameters.
- mean is of type arma::vec and should have length equal to the dimensionality of the distribution.
- scale must be greater than 0.

Access and modify properties of distribution

l.Dimensionality() returns the dimensionality of the distribution.
l.Mean() returns an arma::vec& holding the mean of the distribution. This can be modified.
l.Scale() returns a double representing the distribution's scale parameter. l.Scale() = s will set the scale parameter to s.
A LaplaceDistribution can be serialized with data::Save() and data::Load().

Compute probabilities of points

l.Probability(observation) returns the probability of the given observation as a double.
- observation should be an arma::vec of size l.Dimensionality().
l.Probability(observations, probabilities) computes the probabilities of many observations.
- observations should be an arma::mat with number of rows equal to l.Dimensionality(); observations.n_cols is the number of observations.
- probabilities will be set to size observations.n_cols.
- probabilities[i] will be set to l.Probability(observations.col(i)).
l.LogProbability(observation) returns the log-probability of the given observation as a double.
l.LogProbability(observations, probabilities) computes the log-probabilities of many observations.

Sample points from the distribution

l.Random() returns an arma::vec with a random sample from the Laplace distribution.

Fit the distribution to observations

l.Train(observations)
- Fit the distribution to the given observations.
- observations should be an arma::mat with number of rows equal to l.Dimensionality(); observations.n_cols is the number of observations.
l.Train(observations, observationProbabilities)
- Fit the distribution to the given observations, as above, but also provide probabilities that each observation is from this distribution.
- observationProbabilities should be an arma::vec of length observations.n_cols.
- observationProbabilities[i] should be equal to the probability that observations.col(i) is from l.

Example usage

// Create a Laplace distribution in 3 dimensions with uniform random mean and
// scale parameter 1.
mlpack::LaplaceDistribution l(arma::randu<arma::vec>(3) /* mean */,
                              1.0 /* scale */);

// Compute the probability and log-probability of the point [0, 0.5, 0.25].
const double p = l.Probability(arma::vec("0 0.5 0.25"));
const double lp = l.LogProbability(arma::vec("0 0.5 0.25"));

std::cout << "Probability of [0 0.5 0.25]:     " << p << "." << std::endl;
std::cout << "Log-probability of [0 0.5 0.25]: " << lp << "." << std::endl;

// Modify the scale, and the mean in dimension 1.
l.Scale() = 2.0;
l.Mean()[1] = 1.5;

// Compute the probability of the same point [0, 0.5, 0.25].
const double p2 = l.Probability(arma::vec("0 0.5 0.25"));
const double lp2 = l.LogProbability(arma::vec("0 0.5 0.25"));

std::cout << "After parameter changes:" << std::endl;
std::cout << "Probability of [0 0.5 0.25]:     " << p << "." << std::endl;
std::cout << "Log-probability of [0 0.5 0.25]: " << lp << "." << std::endl;

// Create a Laplace distribution that is estimated from random samples in 50
// dimensions.  Note that the samples here are normally distributed---so a Gamma
// distribution fit will not be a good one!
arma::mat samples(50, 10000, arma::fill::randn);

mlpack::LaplaceDistribution l2;
l2.Train(samples);

// Compute the probability of all of the samples.
arma::vec probabilities;
l2.Probability(samples, probabilities);

std::cout << "Average probability is: " << arma::mean(probabilities) << "."
    << std::endl;

Using different element types

The LaplaceDistribution class takes one template parameter:

LaplaceDistribution<MatType>

MatType represents the matrix type used to represent observations.
By default, MatType is arma::mat, but any matrix type matching the Armadillo API can be used (e.g. arma::fmat).
When MatType is set to anything other than arma::mat, all arguments are adapted accordingly:
- arma::mat arguments will instead be MatType.
- arma::vec arguments will instead be the corresponding column vector type associated with MatType.
- double arguments will instead be the element type of MatType.

The code below uses a Laplace distribution to make predictions with 32-bit floating point numbers.

// Create a 3-dimensional 32-bit floating point Laplace distribution with
// random mean and scale of 2.0.
mlpack::LaplaceDistribution<arma::fmat> g(arma::randu<arma::fvec>(3), 2.0);

// Compute the probability of the point [0.2, 0.3, 0.4].
const float p = g.Probability(arma::fvec("0.2 0.3 0.4"));

std::cout << "Probability of (0.2, 0.3, 0.4): " << p << "." << std::endl;

`RegressionDistribution`

The RegressionDistribution is a Gaussian distribution fitted on the errors of a linear regression model. Given a point x with response y, the probability of (y, x) is computed using a univariate Gaussian distribution on the scalar residual y - y', where y' is the linear regression model's prediction on x.

This class is meant to be used with mlpack's HMM class for the task of HMM regression (pdf).

Constructors

r = RegressionDistribution()
- Create an empty RegressionDistribution.
- The distribution will not provide useful predictions; call Train() before doing anything else with the object!
r = RegressionDistribution(predictors, responses)
- Create the RegressionDistribution by estimating the parameters with the given labeled regression data predictors and responses.
- predictors should be a column-major arma::mat representing the data the distribution should be trained on.
- responses should be an arma::rowvec representing the responses for each data point.
- The number of elements in responses (e.g. responses.n_elem) should be the same as the number of columns in predictors (e.g. predictors.n_cols).

Access and modify properties of distribution

r.Dimensionality() returns the dimensionality of the distribution.
- Note: this is not the same as the number of elements in a vector passed to Probability()!
r.Rf() returns the LinearRegression& model. This can be modified.
r.Parameters() returns an const arma::vec& with length r.Dimensionality() + 1 representing the parameters of the linear regression model. The first element is the bias; subsequent elements are the weights for each dimension.
r.Err() returns a GaussianDistribution& object representing the univariate distribution trained on the model's residuals. This can be modified.
A RegressionDistribution can be serialized with data::Save() and data::Load().

Compute probabilities of points

r.Probability(observation) returns the probability of the given labeled observation as a double.
- observation should be an arma::vec of size r.Dimensionality() + 1, containing both the data point and its scalar response.
- The first element of observation should be the response; subsequent elements should be the data point.
r.Probability(observations, probabilities) computes the probabilities of many labeled observations.
- observations should be an arma::mat with number of rows equal to r.Dimensionality() + 1; observations.n_cols is the number of observations.
- The first row of observations should correspond to the responses for each data point.
- probabilities will be set to size observations.n_cols.
- probabilities[i] will be set to r.Probability(observations.col(i)).
r.LogProbability(observation) returns the log-probability of the given labeled observation as a double.
r.LogProbability(observations, probabilities) computes the log-probabilities of many labeled observations.

Fit the distribution to observations

Training a RegressionDistribution on a given set of labeled observations is done by first training a LinearRegression model on the dataset, and then subsequently training a univariate GaussianDistribution on the residual error of each data point.

In the Train() overloads, the observations matrix is expected to contain both the responses and the data points (predictors).

r.Train(observations)
- Fit the distribution to the given labeled observations.
- observations should be an arma::mat with number of rows equal to the dimensionality of the data plus one; observations.n_cols is the number of observations.
- The first row of observations should correspond to the responses of the data; subsequent rows correspond to the data itself.
r.Train(observations, observationProbabilities)
- Fit the distribution to the given labeled observations, as above, but also provide probabilities that each observation is from this distribution.
- observationProbabilities should be an arma::rowvec of length observations.n_cols.
- observationProbabilities[i] should be equal to the probability that the i'th observation is from r.

Note: if the linear regression model is able to exactly fit the observations, then the resulting Gaussian distribution will have zero-valued standard deviation, and Probability() will return 1 for points that are perfectly fit and 0 otherwise.

Example usage

// Create an example dataset that arises from a noisy random linear model:
//
//   y = bx + noise
//
// Noise is added from a Gaussian distribution with zero mean and unit variance.
// Data is 10-dimensional, and we will generate 1000 points.
arma::vec b(10, arma::fill::randu);
arma::mat x(10, 1000, arma::fill::randu);

arma::rowvec y = b.t() * x + arma::randn<arma::rowvec>(1000);

// Now fit a RegressionDistribution to the data.
mlpack::RegressionDistribution r(x, y);

// Print information about the distribution.
std::cout << "RegressionDistribution model parameters:" << std::endl;
std::cout << " - " << r.Parameters().subvec(1, r.Parameters().n_elem - 1).t();
std::cout << " - Bias: " << r.Parameters()[0] << "." << std::endl;
std::cout << "True model parameters:" << std::endl;
std::cout << " - " << b.t();
std::cout << "Error Gaussian mean is " << r.Err().Mean()[0] << ", with "
    << "variance " << r.Err().Covariance()[0] << "." << std::endl << std::endl;

// Compute the probability of a point in the training set.  We must assemble the
// points into a single vector.
arma::vec p1(11); // p1 will be point 5 from (x, y).
p1[0] = y[5];
p1.subvec(1, p1.n_elem - 1) = x.col(5);
std::cout << "Probability of point 5:      " << r.Probability(p1) << "."
    << std::endl;

arma::vec p2(11, arma::fill::randu);
std::cout << "Probability of random point: " << r.Probability(p2) << "."
    << std::endl;

// Print log-probabilities too.
std::cout << "Log-probability of point 5:      " << r.LogProbability(p1) << "."
    << std::endl;
std::cout << "Log-probability of random point: " << r.LogProbability(p2) << "."
    << std::endl << std::endl;

// Change the error distribution.
y = b.t() * x + (1.5 * arma::randn<arma::rowvec>(1000));

// Combine x and y to build the observations matrix for Train().
arma::mat observations(x.n_rows + 1, x.n_cols);
observations.row(0) = y;
observations.rows(1, observations.n_rows - 1) = x;

// Assign a random probability for each point.
arma::rowvec observationProbabilities(observations.n_cols, arma::fill::randu);

// Refit the distribution to the new data.
r.Train(observations, observationProbabilities);

// Print new error distribution information.
std::cout << "Updated error Gaussian mean is " << r.Err().Mean()[0] << ", with "
    << "variance " << r.Err().Covariance()[0] << "." << std::endl << std::endl;

// Compute average probability of points in the dataset.
arma::vec probabilities;
r.Probability(observations, probabilities);
std::cout << "Average probability of points in `observations`: "
    << arma::mean(probabilities) << "." << std::endl;

Using different element types

The RegressionDistribution class takes one template parameter:

RegressionDistribution<MatType>

MatType represents the matrix type used to represent observations.
By default, MatType is arma::mat, but any matrix type matching the Armadillo API can be used (e.g. arma::fmat).
When MatType is set to anything other than arma::mat, all arguments are adapted accordingly:
- arma::mat arguments will instead be MatType.
- arma::vec arguments will instead be the corresponding column vector type associated with MatType.
- double arguments will instead be the element type of MatType.

The code below uses a regression distribution trained on 32-bit floating point data.

// Create an example dataset that arises from a noisy random linear model:
//
//   y = bx + noise
//
// Noise is added from a Gaussian distribution with zero mean and unit variance.
// Data is 3-dimensional, and we will generate 1000 points.
arma::fvec b(3, arma::fill::randu);
arma::fmat x(3, 1000, arma::fill::randu);

arma::frowvec y = b.t() * x + arma::randn<arma::frowvec>(1000);

// Now fit a RegressionDistribution to the data.
mlpack::RegressionDistribution<arma::fmat> r(x, y);

// Compute the probability of the point [0.5, 0.2, 0.3, 0.4].
// (Here 0.5 is the response, and [0.2, 0.3, 0.4] is the point.)
const float p = r.Probability(arma::fvec("0.5 0.2 0.3 0.4"));
std::cout << "Probability of (0.5, 0.2, 0.3, 0.4): " << p << "." << std::endl;

Files

distributions.md

Latest commit

History

distributions.md

File metadata and controls

Distributions

DiscreteDistribution

Constructors

Access and modify properties of distribution

Compute probabilities of points

Sample from the distribution

Fit the distribution to observations

Example usage

Using different element types

GaussianDistribution

Constructors

Access and modify properties of distribution

Compute probabilities of points

Sample from the distribution

Fit the distribution to observations

Example usage

Using different element types

DiagonalGaussianDistribution

Constructors

Access and modify properties of distribution

Compute probabilities of points

Sample from the distribution

Fit the distribution to observations

Example usage

Using different element types

GammaDistribution

Constructors

Access and modify properties of distribution

Compute probabilities of points

Sample points from the distribution

Fit the distribution to observations

Example usage

Using different element types

LaplaceDistribution

Constructors

Access and modify properties of distribution

Compute probabilities of points

Sample points from the distribution

Fit the distribution to observations

Example usage

Using different element types

RegressionDistribution

Constructors

Access and modify properties of distribution

Compute probabilities of points

Fit the distribution to observations

Example usage

Using different element types

`DiscreteDistribution`

`GaussianDistribution`

`DiagonalGaussianDistribution`

`GammaDistribution`

`LaplaceDistribution`

`RegressionDistribution`