Page not found
-The page you requested cannot be found (perhaps it was moved or renamed).
-You may want to try searching to find the page's new location, or use -the table of contents to find the page you are looking for.
-diff --git a/01-Introduction_8md_source.html b/01-Introduction_8md_source.html index 99cf2a1e8..83add5df8 100644 --- a/01-Introduction_8md_source.html +++ b/01-Introduction_8md_source.html @@ -28,7 +28,7 @@
The page you requested cannot be found (perhaps it was moved or renamed).
-You may want to try searching to find the page's new location, or use -the table of contents to find the page you are looking for.
-We use the following notation
-Notation | -Explanation | -
---|---|
\(u\) | -The random effects vector | -
\(\theta\) | -Parameter vector (first part) | -
\(\beta\) | -Parameter vector (second part) | -
\(f(u,\beta,\theta)\) | -Joint negative log likelihood | -
\(x\) | -Data | -
\(E(u|x)\) | -Conditional expectation of random effect given data | -
\(\hat u\) | -The posterior mode \(\arg \min_{u} f(u,\beta,\theta)\) | -
This section describes the underlying theory of the argument profile
-to MakeADFun
intended to speedup and robustify linear mixed effect
-models with a large number of fixed effects. With a few common model
-properties (Assumption 1 and 2 below), which must be checked by the
-user, one can apply the profile
argument to move outer parameters to
-the inner problem without affecting the model result.
Theorem 1 (Profiling inner problem) -Assume that for any \(\beta\) and \(\theta\)
-Then the MLE
-\[\hat \beta := \arg \max_{\beta} \left( \int \exp(-f(u,\beta,\theta)) \: du \right) \]
-is a solution to the augmented system
-\[ -\begin{split} -\partial_{u} f(u,\beta,\theta) &= 0 \\ -\partial_{\beta} f(u,\beta,\theta) &= 0 -\end{split} -\]
-The augmented system defines \(\hat \beta\) implicitly as function of the posterior mode \(\hat u\).
-Proof
-Differentiation of the negative log marginal likelihood gives
-\[ -\begin{split} -\partial_{\beta} \left( -\log \int \exp(-f(u,\beta,\theta)) \: du \right) &= E(\partial_{\beta}f(u,\beta,\theta) |x) \\ -&= \partial_{\beta} f(u,\beta,\theta)_{|u=\hat u(\beta,\theta)} -\end{split} -\]
-where the first equality holds in general and the second equality follows from assumptions (1) and (2).
-\(\square\)
-The standard situation for which assumption 1 holds is when the -\(\beta\)s are the linear fixed effects of a mixed model. In this case -the joint negative log density takes the form -\[ f(u,\beta,\theta) = \frac{1}{2}(u-A\beta)'\Sigma_{\theta}^{-1}(u-A\beta) + ... \] -for some design matrix \(A\) where ’ \(...\) ’ does not depend on -\(\beta\). The derivative -\[ \partial_{\beta} f(u,\beta,\theta) = A'\Sigma_{\theta}^{-1}(u-A\beta) \] -is thus a linear function of the random effect \(u\).
-In general assumption 2 holds exact for models with a symmetric -(e.g. Gaussian) posterior distribution.
-This section supplements the documentation of ?sdreport
by adding
-some missing details.
As previously, we consider a general latent variable model with -parameter vector \(\theta\), random effect vector \(u\) and observation -vector \(x\). The TMB estimation procedure works as follows:
-In general, we assume that \(\hat\theta\) is a consistent estimator of -\(\theta\). However, we do not in general require \(\hat u\) to be -consistent for \(u\). The purpose of sdreport is, for a given -realization of the pair \((u,x)\), to quantify the joint uncertainty of -\((\hat u,\hat\theta)\) as estimator of \((u,\theta)\). That is, we are -interested in the variance matrix of the difference
-\[D:=\begin{pmatrix}\hat u\left(\hat\theta(x),x\right) - u\\ \hat\theta(x) - \theta\end{pmatrix}\]
-An important point of the uncertainty quantification is to account -for plugging in \(\hat\theta\) rather than using the true \(\theta\).
-We calculate the variance using the standard formula:
-\[V[D]=E(V(D|x))+V(E(D|x))\]
-Consider \(D\) conditionally on \(x\). The second component does not -depend on \(u\) and \(\hat u\) is constant given \(x\):
-\[V[D|x]=\begin{pmatrix}V[u|x] & 0 \\ 0 & 0 \end{pmatrix}\]
-It follows that
-\[E(V[D|x])=\begin{pmatrix}E(V[u|x]) & 0 \\ 0 & 0 \end{pmatrix}\]
-As central estimator of \(E(V[u|x])\) we use \(V[u|x]\) which is
-approximated by the inverse random effect Hessian \(H_{uu}^{-1}\) based
-on the assumption that \(u|x\) is well approximated by a Gaussian
-distribution (a reasonable assumption given that we are using the
-Laplace approximation). This explains the first term of variance formula in ?sdreport
:
\[E(V[D|x]) \approx \begin{pmatrix} H_{uu}^{-1} & 0 \\ 0 & 0 \end{pmatrix}\]
-Likewise,
-\[E[D|x]=\begin{pmatrix}\hat u\left(\hat\theta(x),x\right) - E(u|x)\\ \hat\theta(x) - \theta\end{pmatrix}\]
-Again, asuming a Gaussian approximation of \(u|x\), it follows that \(E(u|x) \approx \hat u(\theta,x)\):
-\[E[D|x]=\begin{pmatrix}\hat u\left(\hat\theta(x),x\right) - \hat u(\theta,x)\\ \hat\theta(x) - \theta\end{pmatrix}\]
-We approximate the expectation using linerization of \(\theta \rightarrow \hat u(\theta,x)\) around \(\hat\theta(x)\)
-\[E[D|x]=J_x \cdot (\hat\theta(x) - \theta)\]
-We now have the second term of the variance formula in ?sdreport
:
\[V(E[D|x]) \approx J_x V(\hat\theta(x)) J_x'\]
-This term becomes negligible if the amount of data is high because of -the assumed asymptotic consistency of \(\hat\theta\).
- -Custom functions and derivatives can be added to the TMB library. This may be necessary for the following reasons:
-TMB uses CppAD as its engine for reverse mode derivatives. In order to add a new primitive function
-\[f: R^n \rightarrow R^m\]
-we must inform CppAD how to calculate derivatives of this function in reverse mode. That is, for any range space vector \(w \in R^m\) we must calculate the gradient of the function \(R^n \rightarrow R\) given by
-\[ x \rightarrow \text{sum}( f(x) \odot w ) \]
-where ‘\(\odot\)’ is pointwise multiplication.
-As an example consider the Lambert W function defined implicitly by
-\[y = W(y e^y)\]
-Here, we only consider \(W\) as defined on the positive reals. It follows, by differentiating the above identity, that
-\[ W'(x) = \frac{1}{ \exp\left(W(x)\right) \left(1 + W(x)\right) } \]
-When coding reverse-mode derivatives we can assume that the function value \(W(x)\) has already been computed during a forward pass. For efficiency reasons we should use this intermediate calculation rather than re-calculating \(W(x)\) in the reverse pass.
-We’ll assume that a plain C++ function (taking double types as input/output) is available to calculate \(W(x)\). It doesn’t matter whether you have the source code of an implementation or just the header with linkage to an external library:
-double LambertW(double x);
The macro TMB_ATOMIC_VECTOR_FUNCTION()
is used to declare our new primitive Lambert \(W\) function:
TMB_ATOMIC_VECTOR_FUNCTION(
-// ATOMIC_NAME
-
- LambertW
- ,// OUTPUT_DIM
- 1,
- // ATOMIC_DOUBLE
- 0] = LambertW(tx[0]); // Call the 'double' version
- ty[
- ,// ATOMIC_REVERSE
- 0]; // Function value from forward pass
- Type W = ty[1. / (exp(W) * (1. + W)); // Derivative
- Type DW = 0] = DW * py[0]; // Reverse mode chain rule
- px[ )
Let’s explain in detail what is going on. The macro takes four arguments:
-ATOMIC_NAME
: Name of new primitive function taking CppAD::vector
as input and output.OUTPUT_DIM
: Dimension of the CppAD::vector
which is the function output.ATOMIC_DOUBLE
: Specifies how to evaluate the primitive function for the ordinary double type. tx
denotes the input vector and ty
the output vector of the function \(f: R^n \rightarrow R^m\). In this case both have dimension one.ATOMIC_REVERSE
: How to calculate the reverse mode derivatives for a general Type
. Again tx
and ty
denote function input and output but now ty
has been computed and is available as an intermediate value. The vectors px
and py
denote partial derivatives of the end result with respect to \(x\) and \(y\) respectively. py
is given and we must calculate px
using the chain rule. This first order derivative rule is automatically expanded up to higher orders required when using TMB’s random effects calculations.To make the function work like other TMB functions it is convenient to define scalar and a vectorized versions that call the atomic function:
-// Scalar version
-template<class Type>
-x){
- Type LambertW(Type CppAD::vector<Type> tx(1);
- 0] = x;
- tx[return LambertW(tx)[0];
-
- }
-// Vectorized version
- VECTORIZE_1t(LambertW)
Here is a complete example using Newton’s method to calculate the Lambert \(W\) function -(there are more sophisticated algorithms such as the one by Fukushima (2013), -but that doesn’t matter for this example):
-
-#include <TMB.hpp>
-
-// Double version of Lambert W function
-double LambertW(double x) {
-double logx = log(x);
- double y = (logx > 0 ? logx : 0);
- int niter = 100, i=0;
- for (; i < niter; i++) {
- if ( fabs( logx - log(y) - y) < 1e-9) break;
- 1 + y);
- y -= (y - exp(logx - y)) / (
- }if (i == niter) Rf_warning("W: failed convergence");
- return y;
-
- }
-TMB_ATOMIC_VECTOR_FUNCTION(
-// ATOMIC_NAME
-
- LambertW
- ,// OUTPUT_DIM
- 1,
- // ATOMIC_DOUBLE
- 0] = LambertW(tx[0]); // Call the 'double' version
- ty[
- ,// ATOMIC_REVERSE
- 0]; // Function value from forward pass
- Type W = ty[1. / (exp(W) * (1. + W)); // Derivative
- Type DW = 0] = DW * py[0]; // Reverse mode chain rule
- px[
- )
-// Scalar version
-template<class Type>
-x){
- Type LambertW(Type CppAD::vector<Type> tx(1);
- 0] = x;
- tx[return LambertW(tx)[0];
-
- }
-// Vectorized version
-VECTORIZE1_t(LambertW)
-
-template<class Type>
-operator() ()
- Type objective_function<Type>::
- {PARAMETER_VECTOR(x);
- x).sum();
- Type f = LambertW(return f;
- }
And from R
-compile("lambert.cpp")
-dyn.load(dynlib("lambert"))
Check definition of the function:
-<- MakeADFun(data=list(), parameters=list(x=1), DLL="lambert")
- obj $fn(7 * exp(7)) obj
## [1] 7
-Check derivatives using the numDeriv
package:
::grad(obj$fn, 7) numDeriv
## [1] 0.08626538
-$gr(7) obj
## [,1]
-## [1,] 0.08626538
-Also try second order derivatives:
-::hessian(obj$fn, 7) numDeriv
## [,1]
-## [1,] -0.01038959
-$he(7) obj
## [,1]
-## [1,] -0.01038969
-For the Lambert \(W\) function we know how to calculate the derivatives. There are cases for which the derivatives are impossible (or difficult) to write down. If you’re in this situation you may want to try using forward mode AD to help in defining an atomic function. A full worked out example is available here: adaptive_integration.cpp. Derivatives are calculated automatically and if-else branching is allowed. The main downside with this approach is that it is limited to functions with very few inputs.
-Checkpointing is another useful technique. It is demonstrated in the example register_atomic.cpp. It does not work for adaptive algorithms but is otherwise automatic. It is useful to reduce AD memory usage in cases where the same sequence of operations is being applied many times.
- - -Summary of how syntax differs between R and C++:
-R code C++/TMB code
-
-# // // Comment symbol
- Comments 3.4 Type(3.4); // Explicit casting recommended in TMB
- Constants x = 5.2 Type x = Type(5.2); // Variables must have type
- Scalar x = numeric(10) vector<Type> x(10); // C++ code here does NOT initialize to 0
- Arrays x[1]+x[10] x(0)+x(9); // C++ indexing is zero-based
- Indexing for(i in 1:10) for(int i=1;i<=10;i++) // Integer i must be declared in C++
- Loops x[1] = x[1] + 3 x(0) += 3.0; // += -= *= /= incremental operators in C++ Increments
It is important to note the following difference compared to R:
---Vectors, matrices and arrays are not zero-initialized in C++.
-
A zero initialized object is created using Eigens setZero()
:
matrix<Type> m(4,5);
- m.setZero();
The namespace
-using namespace density;
gives access to a variety of multivariate normal distributions:
-These seemingly unrelated concepts are all implemented via the notion
-of a distribution
, which explains why they are placed in the same
-namespace. You can combine two distributions
, and this lets you
-build up complex multivariate distributions using extremely compact
-notation. Due to the flexibility of the approach it is more abstract
-than other parts of TMB, but here it will be explained from
-scratch. Before looking at the different categories of multivariate
-distributions we note the following which is of practical importance:
density
namespace return the negative log
-density, opposed to the univariate densities in R style distributions.Consider a zero-mean multivariate normal distribution -with covariance matrix Sigma (symmetric positive definite), -that we want to evaluate at x:
-int n = 10;
-vector<Type> x(n); // Evaluation point
-fill(0.0); // Point of evaluation: x = (0,0,...,0) x.
The negative log-normal density is evaluated as follows:
-using namespace density;
-matrix<Type> Sigma(n,n); // Covariance matrix
-// ..... User must assign value to Sigma here
-MVNORM(Sigma)(x); // Evaluate negative log likelihod res =
In the last line MVNORM(Sigma)
should be interpreted as a
-multivariate density, which via the last parenthesis (x)
is
-evaluated at x
. A less compact way of expressing this is
MVNORM_t<Type> your_dmnorm(Sigma);
-x); res = your_dmnorm(
in which your_dmnorm
is a variable that holds the “density.”
Note, that the latter way (using the MVNORM_t
) is more efficient
-when you need to evaluate the density more than once, i.e. for different values of x
.
Sigma can be parameterized in different ways. Due to the symmetry of
-Sigma there are at most n(n+1)/2 free parameters (n variances
-and n(n-1)/2 correlation parameters). If you want to estimate all of
-these freely (modulo the positive definite constraint) you can use
-UNSTRUCTURED_CORR()
to specify the correlation matrix, and
-VECSCALE()
to specify variances. UNSTRUCTURED_CORR()
takes as
-input a vector a dummy parameters that internally is used to build the
-correlation matrix via its cholesky factor.
using namespace density;
-int n = 10;
-vector<Type> unconstrained_params(n*(n-1)/2); // Dummy parameterization of correlation matrix
-vector<Type> sds(n); // Standard deviations
-VECSCALE(UNSTRUCTURED_CORR(unconstrained_params),sds)(x); res =
If all elements of dummy_params
are estimated we are in effect
-estimating a full correlation matrix without any constraints on its
-elements (except for the mandatory positive definiteness). The actual
-value of the correlation matrix, but not the full covariance matrix,
-can easily be assessed using the .cov()
operator
matrix<Type> Sigma(n,n);
-UNSTRUCTURED_CORR(unconstrained_params).cov();
- Sigma = REPORT(Sigma); // Report back to R session
Consider a stationary univariate Gaussian AR1 process -x(t),t=0,…,n-1. The stationary distribution is choosen so that:
-The multivariate density of the vector x can be evaluated as follows
-int n = 10;
-using namespace density;
-
- vector<Type> x(n); // Evaluation point
-fill(0.0); // Point of evaluation: x = (0,0,...,0)
- x.0.2; // Correlation parameter
- Type rho = x); // Evaluate negative log-density of AR1 process at point x res = AR1(rho)(
Due to the assumed stationarity the correlation parameter must -satisfy:
-Note that cor[x(t),x(t-1)] = rho.
-The SCALE()
function can be used to set the standard deviation.
2.1; // standard deviation of x
- Type sigma = x); res = SCALE(AR1(rho),sigma)(
Now, var[x(t)] = sigma^2. Because all elements of x
are scaled by
-the same constant we use SCALE rather than VECSCALE.
This is the first real illustration of how distributions can be used -as building blocks to obtain more complex distributions. Consider the -p dimensional AR1 process
- -The columns in x
refer to the different time points. We then
-evaluate the (negative log) joint density of the time series.
MVNORM_t<Type> your_dmnorm(Sigma); // Density of x(t)
-// Correlation parameter
- Type phi; x); res = AR1(phi,your_dmnorm)(
Note the following:
-your_dmnorm
, which
-holds the p-dim density marginal density of x(t). This is a
-zero-mean normal density with covariance matrix Sigma
.GMRF may be defined in two ways:
-For further details please see GMRF_t
. Under 1) a sparse Q
-corresponding to a Matern covariance function can be obtained via the
-R_inla
namespace.
A typical use of separability is to create space-time models with a
-sparse precision matrix. Details are given in SEPARABLE_t
. Here
-we give a simple example.
Assume that we study a quantity x
that changes both in space and
-time. For simplicity we consider only a one-dimensional space. We
-discretize space and time using equidistant grids, and assume that the
-distance between grid points is 1 in both dimensions. We then define
-an AR1(rho_s)
process in space and one in time AR1(rho_t)
. The
-separable assumption is that two points x1
and x2
, separated in
-space by a distance ds
and in time by a distance dt
, have
-correlation given by
rho_s^ds*rho_t^dt
This is implemented as
-using namespace density;
-int n_s = 10; // Number of grid points in space
-int n_t = 10; // Number of grid points in time
-0.2; // Correlation in space
- Type rho_s = rho_t = 0.4; // Correlation in time
- Type
-array<Type> x(n_s,n_t);
-// x = 0
- x.setZero();
-SEPARABLE(AR1(rho_t),AR1(rho_s))(x); res =
Note that the arguments to SEPARABLE()
are given in the opposite order
-to the dimensions of x
.
The R interface to the debugger (gdb) is documented as part of the R
-help system, i.e. you can type ?gdbsource
in R to get info. The
-current document only adresses isses that the relate to C++.
It may be hard to understand the compilation errors for the following -reasons
-Run time errors are broadly speaking of two types:
-You can use the debugger to locate both types of errors, but the
-procedure is a little bit different in the two cases. The following
-assumes that you have the GNU debugger gdb
installed.
An example is:
-vector<Type> y(4);
-5); // 5 is not a valid index value here y(
This will cause TMB and R to crash with the following error message:
---TMB has received an error from Eigen. The following condition was not met: -index >= 0 && index < size() -Please check your matrix-vector bounds etc., or run your program through a debugger. -Aborted (core dumped)
-
So, you must restart R and give the commands
-library(TMB)
-gdbsource("my_project.R")
--#5 objective_function
-::operator() (this= ) at nan_error_ex.cpp:11
and you can see that the debugger points to line number 11 in the .cpp
-file. gdbsource()
is an R function that is part of TMB.
If you on the other hand perform an illegal mathematical operation, -such as
-1.); Type f = sqrt(-
R will not crash, but the objective function will return a NaN value.
-However, you will not know in which part of your C++ code the error
-occured. By including the fenv.h
library (part of many C++
-compilers, but can otherwise be downloaded from
-http://www.scs.stanford.edu/histar/src/uinc/fenv.h)
nan_error_ex.cpp
:
// Illustrates how to make the debugger catch a floating point error.
-#include <TMB.hpp>
-#include <fenv.h> // Extra line needed
-
-template<class Type>
-operator() ()
- Type objective_function<Type>::
- {// Extra line needed
- feenableexcept(FE_INVALID | FE_OVERFLOW | FE_DIVBYZERO | FE_UNDERFLOW);
- DATA_SCALAR(lambda);
- PARAMETER(x);
-
- Type f;1.); // FE_INVALID ( sqrt(-1.) returns NaN )
- f = sqrt(-//f = 1./0.; // FE_DIVBYZERO ( division by zero )
- //f = exp(100000.); // FE_OVERFLOW ( exp(100000.) returns Inf ) [Does not work on all platforms]
- //f = exp(-100000.); // FE_UNDERFLOW ( exp(-100000.) returns 0 )
- return f;
- }
a floating point exception will be turned into an actual error that -can be picked up by the debugger. There are only two extra lines that -need to be included (“//Extra line needed” in the above example).
-When we try to run this program in the usual way, the program crashes:
-source("nan_error_ex.R")
--Floating point exception (core dumped) -tmp3>
-
At this stage you should run the debugger to find out that the -floating point exception occurs at line number 14:
-library(TMB)
-gdbsource("nan_error_ex.R")
--#1 0x00007ffff0e7eb09 in objective_function
-::operator() (this= ) at nan_error_ex.cpp:14
This enabling of floating point errors applies to R as well as the TMB program.
-For more elaborate R-scripts it may therefore happen that a NaN occurs in the R-script
-before the floating point exception in the TMB program (i.e. the problem of interest) happens.
-To circumvent this problem one can run without NaN debugging enabled and
-save the parameter vector that gave the floating point exception (e.g. badpar <- obj$env$last.par
after the NaN evaluation),
-then enable NaN debugging, re-compile, and evaluate obj$env$f( badpar, type="double")
.
TMB vectorized functions cannot be called directly with expressions, for example the following will fail to compile:
-DATA_VECTOR(x);
-// Don't do this! Doesn't compile
-vector<Type> out = lgamma(x + 1);
--error: could not convert ‘atomic::D_lgamma(const CppAD::vector
-&) … -from ‘double’ to ‘Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, … >’
Eigen lazy-evaluates expressions, and the templating of lgamma means we expect to return a “x + y”-typed object, which it obviously can’t do.
-To work around this, cast the input:
- -A list of all examples is found on the “Examples” tab on the top of the page.
Locations of example files: adcomp/tmb_examples
and adcomp/TMB/inst/examples
.
For each example there is both a .cpp
and a .R
file. Take for instance the linear regression example:
C++ template
-// Simple linear regression.
-#include <TMB.hpp>
-template<class Type>
-operator() ()
- Type objective_function<Type>::
- {DATA_VECTOR(Y);
- DATA_VECTOR(x);
- PARAMETER(a);
- PARAMETER(b);
- PARAMETER(logSigma);
- ADREPORT(exp(2*logSigma));
- sum(dnorm(Y, a+b*x, exp(logSigma), true));
- Type nll = -return nll;
- }
Controlling R code
-library(TMB)
-compile("linreg.cpp")
-dyn.load(dynlib("linreg"))
-set.seed(123)
-<- list(Y = rnorm(10) + 1:10, x=1:10)
- data <- list(a=0, b=0, logSigma=0)
- parameters <- MakeADFun(data, parameters, DLL="linreg")
- obj $hessian <- TRUE
- obj<- do.call("optim", obj)
- opt
- opt$hessian ## <-- FD hessian from optim
- opt$he() ## <-- Analytical hessian
- objsdreport(obj)
To run this example use the R command
-source("linreg.R")
Example | -Description | -
---|---|
adaptive_integration.cpp | -Adaptive integration using ‘tiny_ad’ | -
ar1_4D.cpp | -Separable covariance on 4D lattice with AR1 structure in each direction. | -
compois.cpp | -Conway-Maxwell-Poisson distribution | -
fft.cpp | -Multivariate normal distribution with circulant covariance | -
hmm.cpp | -Inference in a ‘double-well’ stochastic differential equation using HMM filter. | -
laplace.cpp | -Laplace approximation from scratch demonstrated on ‘spatial’ example. | -
linreg_parallel.cpp | -Parallel linear regression. | -
linreg.cpp | -Simple linear regression. | -
longlinreg.cpp | -Linear regression - 10^6 observations. | -
lr_test.cpp | -Illustrate map feature of TMB to perform likelihood ratio tests on a ragged array dataset. | -
matern.cpp | -Gaussian process with Matern covariance. | -
mvrw_sparse.cpp | -Identical with random walk example. Utilizing sparse block structure so efficient when the number of states is high. | -
mvrw.cpp | -Random walk with multivariate correlated increments and measurement noise. | -
nmix.cpp | -nmix example from https://groups.nceas.ucsb.edu/non-linear-modeling/projects/nmix | -
orange_big.cpp | -Scaled up version of the Orange Tree example (5000 latent random variables) | -
register_atomic_parallel.cpp | -Parallel version of ‘register_atomic’ | -
register_atomic.cpp | -Similar to example ‘adaptive_integration’ using CppAD Romberg integration. REGISTER_ATOMIC is used to reduce tape size. | -
sam.cpp | -State space assessment model from Nielsen and Berg 2014, Fisheries Research. | -
sde_linear.cpp | -Inference in a linear scalar stochastic differential equation. | -
sdv_multi_compact.cpp | -Compact version of sdv_multi | -
sdv_multi.cpp | -Multivatiate SV model from Skaug and Yu 2013, Comp. Stat & data Analysis (to appear) | -
socatt.cpp | -socatt from ADMB example collection. | -
spatial.cpp | -Spatial poisson GLMM on a grid, with exponentially decaying correlation function | -
spde_aniso_speedup.cpp | -Speedup “spde_aniso.cpp” by moving normalization out of the template. | -
spde_aniso.cpp | -Anisotropic version of “spde.cpp.” | -
spde.cpp | -Illustration SPDE/INLA approach to spatial modelling via Matern correlation function | -
thetalog.cpp | -Theta logistic population model from Pedersen et al 2012, Ecol. Modelling. | -
TMBad/interpol.cpp | -Demonstrate 2D interpolation operator | -
TMBad/sam.cpp | -State space assessment model from Nielsen and Berg 2014, Fisheries Research. | -
TMBad/solver.cpp | -Demonstrate adaptive solver of TMBad | -
TMBad/spa_gauss.cpp | -Demonstrate saddlepoint approximation (SPA) | -
TMBad/spatial.cpp | -Spatial poisson GLMM on a grid, with exponentially decaying correlation function | -
TMBad/spde_epsilon.cpp | -Low-level demonstration of fast epsilon bias correction using ‘sparse plus lowrank’ hessian | -
TMBad/thetalog.cpp | -Theta logistic population model from Pedersen et al 2012, Ecol. Modelling. | -
transform_parallel.cpp | -Parallel version of transform | -
transform.cpp | -Gamma distributed random effects using copulas. | -
transform2.cpp | -Beta distributed random effects using copulas. | -
tweedie.cpp | -Estimating parameters in a Tweedie distribution. | -
validation/MVRandomWalkValidation.cpp | -Estimate and validate a multivariate random walk model with correlated increments and correlated observations. | -
validation/randomwalkvalidation.cpp | -Estimate and validate a random walk model with and without drift | -
validation/rickervalidation.cpp | -Estimate and validate a Ricker model based on data simulated from the logistic map | -
TMB (Template Model Builder) is an R package for fitting statistical latent variable models to data. It is strongly inspired by ADMB. Unlike most other R packages the model is formulated in C++. This provides great flexibility, but requires some familiarity with the C/C++ programming language.
-nlminb()
is easy.A more general introduction including the underlying theory used in TMB can be found in this paper.
-The TMB core model object is the object returned by MakeADFun()
.
-A number of options can be passed to MakeADFun
to control the
-model. The current section walks you through all the
-options. Additionally we demonstrate some of the methods that can be
-applied to a fitted model object. We shall see how to:
map
argument.random
and profile
.sdreport
ing a fitted object.-- -FIXME: NOT DONE YET !
-
Attempts have been made to make the interface (function name and -arguments) as close as possible to that of R.
-The densities (d...
) are provided both in the discrete and
-continuous case, cumulative distributions (p...
) and inverse
-cumulative distributions (q...
) are provided only for continuous
-distributions.
Scalar and vector
arguments (in combination) are supported, but
-not array
or matrix
arguments.
The last argument (of type int
) corresponds to the log
-argument in R: 1=logaritm, 0=ordinary scale. true
(logaritm) and
-false
(ordinary scale) can also be used.
Vector arguments must all be of the same length (no recycling of -elements). If vectors of different lengths are used an “out of -range” error will occur, which can be picked up by the debugger.
DATA_IVECTOR()
and DATA_INTEGER()
cannot be used with
-probability distributions, except possibly for the last (log)
-argument.
An example:
-To sum over elements in the vector returned use
-When building models in TMB it is generally recommended to test the -implementation on simulated data. Obviously, data can be simulated -from R and passed to the C++ template. In practice this amounts to -implementing the model twice and is thus a strong way to validate the -implementation of the model. However, with increased model complexity -it becomes inconvenient to maintain two separate -implementations. Therefore, TMB allows the the user to write the -simulation code as an integrated part of the C++ model template.
-The TMB simulation routines use the same naming convention as the R
-simulators. For instance rnorm()
is used to simulate from a normal
-distribution. However, the argument convention is slightly
-different:
rnorm(n, mu, sd)
draws n
simulations from a normal
-distribution. Unlike R this works for scalar parameters only.rnorm(mu, sd)
is a TMB specific variant that works for
-mixed scalar and vector input. Output length follows the length of the
-longest input (no re-cycling) hence is consistent with dnorm(mu, sd)
.Currently the following simulators are implemented:
----
rnorm()
,rpois()
,runif()
,rbinom()
,rgamma()
,rexp()
,rbeta()
,rf()
,rlogis()
,rt()
,rweibull()
,rcompois()
,rtweedie()
,rnbinom()
,rnbinom2()
Objects from the density namespace have their own simulate()
-method. Taking the multivariate normal distribution as example we have
-the following ways to draw a simulation:
MVNORM(Sigma).simulate()
returns a vector with a simulation from
-the multivariate normal distribution. The void argument version is
-only available when there is no ambiguity in the dimension of the
-output. In the MVNORM
case the dimension of the output is known
-from the dimension of Sigma
. In other cases e.g. AR1(phi)
the
-dimension of the output is not known hence the void argument
-version is not available.MVNORM(Sigma).simulate(x)
pass x
by reference and writes the
-simulation directly to x
without returning anything. This version
-is available for all the classes because the dimension of the
-simulation can always be deduced from x
.Simulation functions can be called from anywhere in the C++ -program. However, usually one should put the simulation code inside -specialized simulation blocks that allows the code to only be -executed when requested from R.
-A complete example extending the example linreg.cpp -with simulation code is:
-
-#include <TMB.hpp>
-template<class Type>
-operator() ()
- Type objective_function<Type>::
- {DATA_VECTOR(y);
- DATA_VECTOR(x);
- PARAMETER(a);
- PARAMETER(b);
- PARAMETER(sd);
- vector<Type> mu = a + b * x;
- sum(dnorm(y, mu, sd, true));
- Type nll = -SIMULATE {
- rnorm(mu, sd); // Simulate response
- y = REPORT(y); // Report the simulation
-
- }return nll;
- }
--The
-SIMULATE
block marks the simulation and is not executed by default.
We compile the C++-file and the model object is constructed as usual:
-<- MakeADFun(data, parameters, DLL="linreg") obj
Now a simulation can be generated with
-set.seed(1) ## optional
-$simulate() obj
## $y
-## [1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
-## [7] 0.4874291 0.7383247 0.5757814 -0.3053884
-This only includes the simulated response - not the rest of the -data. A complete dataset can be generated by:
-set.seed(1) ## optional - Note: same y as previous
-$simulate(complete=TRUE) obj
## $y
-## [1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
-## [7] 0.4874291 0.7383247 0.5757814 -0.3053884
-##
-## $x
-## [1] 1 2 3 4 5 6 7 8 9 10
-##
-## attr(,"check.passed")
-## [1] TRUE
-Here we did not explicitely state the parameter values to use with the
-simulation. The simulate
method takes an additional argument par
-that can be used for this.
--The default parameter values used for the simulation is -
-obj$env$last.par
.
Simulating datasets from known parameters and re-estimationg those -parameters can be done generically by:
-<- replicate(50, {
- sim <- obj$simulate(par=obj$par, complete=TRUE)
- simdata <- MakeADFun(simdata, parameters, DLL="linreg", silent=TRUE)
- obj2 nlminb(obj2$par, obj2$fn, obj2$gr)$par
- })
We reshape and plot the result:
-library(lattice)
-df <- data.frame(estimate=as.vector(sim), parameter=names(obj$par)[row(sim)])
-densityplot( ~ estimate | parameter, data=df, layout=c(3,1))
Compare with the true parameter values of the simulation:
-$par obj
## a b sd
-## 0 0 1
-The examples sam.cpp and ar1_4D.cpp -includes more advanced simulation code. The latter demonstrates how to -simulate from the density objects:
-// Separable covariance on 4D lattice with AR1 structure in each direction.
-#include <TMB.hpp>
-
-/* Parameter transform */
-template <class Type>
-x){return Type(2)/(Type(1) + exp(-Type(2) * x)) - Type(1);}
- Type f(Type
-template<class Type>
-operator() ()
- Type objective_function<Type>::
- {DATA_VECTOR(N)
- PARAMETER_ARRAY(eta);
- PARAMETER(transf_phi); /* fastest running dim */
-
- Type phi=f(transf_phi);ADREPORT(phi);
-
-using namespace density;
- 0;
- Type res=
-
- res+=AR1(phi,AR1(phi,AR1(phi,AR1(phi))))(eta);
-// logdpois = N log lam - lam
- for(int i=0;i<N.size();i++)res-=N[i]*eta[i]-exp(eta[i]);
-
-SIMULATE {
- simulate(eta);
- AR1(phi,AR1(phi,AR1(phi,AR1(phi)))).vector<Type> lam = exp(eta);
- N = rpois(lam);
- REPORT(eta);
- REPORT(N);
-
- }
-return res;
-
- }
In this example the 4D-array eta
is passed to the simulator by
-reference. Thereby the simulator knows the dimension of eta
and can
-fill eta
with a simulation.
The above example only used one simulation block. In general there is -no limitation on the number of simulation blocks that can be used in a -model and simulation blocks can use temporaries calculated outside the -blocks (as demonstrated in the linear regression example). For clarity -reasons, it is often a good idea to add a simulation block after each -likelihood contribution. However, note that simulation blocks are in -general not commutative (unlike likelihood accumulation). It is -therefore further recommended to add likelihood contributions of -random effects in the natural hierarchical order.
- -Large random effect models require sparsity in order to work in TMB. -In this section we will discuss:
-There are various graph representations that are commonly used to -visualize probabilistic structure. One such is the conditional -independence graph. Say we have a model of four random variables \(X_1,...,X_4\) for which the joint density is:
-\[p(x_1,x_2,x_3,x_4) \propto f_1(x_1,x_2)f_2(x_2,x_3)f_3(x_3,x_4)f_4(x_4,x_1)\]
-The separability of factors on the right hand side implies some conditional independence properties. For instance if \(x_1\) and \(x_3\) are held constant then \(x_2\) and \(x_4\) varies independently. We say that \(x_2\) and \(x_4\) are conditionally independent given \(x_1\) and \(x_3\). -The conditional independence graph is defined by drawing undirected edges between variables occurring in the same factor \(f_i\):
- -Equivalently the graph may be visualized via its adjacency matrix:
- -This is the sparsity pattern of the model.
---The sparsity pattern visualizes the conditional independence structure of the random effects in the model.
-
Important probabilistic properties can be deduced directly from the graph. This is due to the following node removal properties.
-The conditional distribution given node \(X_i\) is found by removing \(X_i\) and its edges from the graph. For instance conditional on \(X_4\) we get the following graph: -
The marginal distribution wrt. a node \(X_i\) is found by removing \(X_i\) from the graph and connecting all \(X_i\)’s neighbors. For instance when integrating \(X_4\) out of the joint density we get the following graph for the remaining nodes: -
--Conditioning preserves sparseness. Marginalizing tend to destroy sparseness by adding more edges to the graph.
-
When building models in TMB it is often more natural to specify processes in incremental steps - i.e. through the successive conditional distributions. The previous example could be simulated by drawing the variables \(X_1,X_2,X_3,X_4\) one by one in the given order as illustrated by the following directed graph:
- -The graph shows dependencies of any specific node given past nodes. The edge from \(X_1\) to \(X_3\) was not in the original (undirected) graph. This is a so-called fill-in.
---Order matters. The DAG is different from the conditional independence graph.
-
It is convenient to use a box-shape for nodes that represent data. For instance if we pretend that \(X_4\) is a data point we would illustrate it by:
- -Here there are only three variables left. The conditional independence structure of the variables is:
- -which is the same graph as was previously found by integrating \(X_4\) out of the joint distribution.
---Data nodes destroy sparsity the same way as marginalization. To avoid this, try to associate each data point with a single random effect.
-
Consider the ``theta logistic’’ population model (Pedersen et al. 2011). This is a -state-space model with state equation
-\[u_t = u_{t-1} + r_0\left(1-\left(\frac{\exp(u_{t-1})}{K}\right)^\psi\right) + e_t\]
-and observation equation
-\[y_t = u_t + v_t\]
-where \(e_t \sim N(0,Q)\), \(v_t \sim N(0,R)\) and \(t\in \{0,...,n-1\}\). A -uniform prior is implicitly assigned to \(u_0\).
-The joint negative log-likelihood of state vector \(u\) and measurements -\(y\) is implemented in the C++ template thetalog.cpp. -The example can be run by:
-runExample("thetalog", exfolder="adcomp/tmb_examples")
We demonstrate it in the case \(n=5\). Here is the DAG
- -This is a standard hidden Markov structure. Each data node is bound to a single random effect - hence the data does not introduce additional edges in the random effect structure.
-We can use the image
function from the Matrix
package to plot the random effect structure (we must first load the Matrix package):
library(Matrix)
-<- MakeADFun(data, parameters, random=c("X"), DLL="thetalog")
- obj image(obj$env$spHess(random=TRUE))
-- - -FIXME: NOT DONE YET !
-
This documentation only covers the TMB specific code, not
-CppAD
-or
-Eigen
-These packages have their own documentation, which may be relevant.
-In particular, some of the standard functions like sin()
and cos()
-are part of CppAD, and are hence not documented through TMB.
First read the Statistical Modelling section of Tutorial.
-The underlying latent random variables in TMB must be Gaussian for the
-Laplace approximation to be accurate. To obtain other distributions,
-say a gamma distribution, the “transformation trick” can be used. We
-start out with normally distributed variables u
and transform these
-into new variables w
via the pnorm
and qgamma
functions as
-follows:
PARAMETER_VECTOR(u); // Underlying latent random variables
-0.0);
- Type nll=Type(sum(dnorm(u,Type(0),Type(1),true)); // Assign N(0,1) distribution u
- nll -= vector<Type> v = pnorm(u,Type(0),Type(1)); // Uniformly distributed variables (on [0,1])
-vector<Type> w = qgamma(v,shape,scale);
w
now has a gamma distribution.
The Laplace approximation can not be applied to discrete latent -variables that occur in mixture models and HMMs (Hidden Markov -models). However, such likelihoods have analytic expressions, and may -be coded up in TMB. TMB would still calculate the exact gradient of -the HMM likelihood.
-Although mixture models are a special case of discrete latent variable -models, they do deserve special attention. Consider the case that we -want a mixture of two zero-mean normal distributions (with different -standard deviations). This can be implemented as:
-DATA_VECTOR(x);
-PARAMETER_VECTOR(sigma); // sigma0 og sigma1
-PARAMETER(p); // Mixture proportion of model 0
-0.0);
- Type nll=Type(sum( log( p * dnorm(x, Type(0), sigma(0), false)
- nll -= 1.0-p) * dnorm(x, Type(0), sigma(1), false) ) ); + (
Autoregressive (AR) processes may be implemented using the compact -notation of section Densities. The resulting AR process may be -applied both in the observational part and in the distribution of a -latent variable.
-Nonlinear time must be implemented from scratch, as in the example -thetalog.cpp
-A TMB project consists of an R file (.R) and a C++ file (.cpp). The -R file does pre- and post processing of data in addition to maximizing -the log-likelihood contained in *.cpp. See Examples for more -details. All R functions are documented within the standard help -system in R. This tutorial describes how to write the C++ file, and -assumes familiarity with C++ and to some extent with R.
-The purpose of the C++ program is to evaluate the objective function,
-i.e. the negative log-likelihood of the model. The program is compiled
-and called from R, where it can be fed to a function minimizer like
-nlminb()
.
The objective function should be of the following C++ type:
-#include <TMB.hpp>
-
-template<class Type>
-operator() ()
- Type objective_function<Type>::
- {
- .... Here goes your C++ code ..... }
The first line includes the source code for the whole TMB package (and
-all its dependencies). The objective function is a templated class
-where <Type>
is the data type of both the input values and
-the return value of the objective function. This allows us to
-evaluate both the objective function and its derivatives using the
-same chunk of C++ code (via the AD package
-CppAD). The
-technical aspects of this are hidden from the user. There is however
-one aspect that surprises the new TMB user. When a constant like “1.2”
-is used in a calculation that affects the return value it must be
-“cast” to Type:
// Define variable that holds the return value (neg. log. lik)
- Type nll; 1.2); // Assign value 1.2; a cast is needed. nll = Type(
Obviously, we will need to pass both data and parameter values to the -objective function. This is done through a set of macros that TMB -defines for us.
----
DATA_ARRAY()
,DATA_FACTOR()
,DATA_IARRAY()
,DATA_IMATRIX()
,DATA_INTEGER()
,DATA_IVECTOR()
,DATA_MATRIX()
,DATA_SCALAR()
,DATA_SPARSE_MATRIX()
,DATA_STRING()
,DATA_STRUCT()
,DATA_UPDATE()
,DATA_VECTOR()
---
PARAMETER()
,PARAMETER_ARRAY()
,PARAMETER_MATRIX()
,PARAMETER_VECTOR()
To see which macros are available start typing
-DATA_
or PARAMETER_
in the Doxygen search field of
-your browser (you may need to refresh the browser window between each
-time you make a new search). A simple example if you want to read a
-vector of numbers (doubles) is the following
DATA_VECTOR(x); // Vector x(0),x(1),...,x(n-1), where n is the length of x
Note that all vectors and matrices in TMB uses a zero-based
-indexing scheme. It is not necessary to explicitly pass the dimension
-of x
, as it can be retrieved inside the C++ program:
int n = x.size();
TMB extends C++ with functionality that is important for formulating -likelihood functions. You have different toolboxes available:
-In addition to the variables defined through the DATA_
or
-PARAMETER_
macros there can be “local” variables, for which
-ordinary C++ scoping rules apply. There must also be a variable that
-holds the return value (neg. log. likelihood).
As in ordinary C++ local variable tmp must be assigned a value before -it can enter into a calculation.
-TMB can handle complex statistical problems with hierarchical
-structure (latent random variables) and multiple data sources. Latent
-random variables must be continuous (discrete distributions are not
-handled). The PARAMETER_
macros are used to pass two types
-of parameters.
Which of these are chosen is controlled from R, via the
-random
argument to the function MakeADFun
. However,
-on the C++ side it is usually necessary to assign a probability
-distribution to the parameter.
The purpose of the C++ program is to calculate the (negative) joint -density of data and latent random variables. Each datum and individual -latent random effect gives a contribution to log likelihood, which may -be though of as a “distribution assignment” by users familiar with -software in the BUGS family.
-PARAMETER_VECTOR(u); // Latent random variable
-0); // Return value
- Type nll = Type(dnorm(u(0),0,1,true) // Distributional assignment: u(0) ~ N(0,1) nll -=
The following rules apply:
-Distribution assignments do not need to take place before the latent -variable is used in a calculation.
More complicated distributional assignments are allowed, say -u(0)-u(1) ~ N(0,1), but this requires the user to have a deeper -understanding of the probabilistic aspects of the model.
For latent variables only normal distributions should be used -(otherwise the Laplace approximation will perform poorly). For -response variables all probability distributions (discrete or -continuous) are allowed. If a non-gaussian latent is needed the -“transformation trick” can be used.
The namespaces R style distributions and Densities contain -many probability distributions, including multivariate normal -distributions. For probability distributions not available from -these libraries, the user can use raw C++ code:
-DATA_VECTOR(y); // Data vector
-0); // Return value
- Type nll = Type(sum(-log(Type(1.0)+y*y)); // y are i.i.d. Cauchy distributed nll -=
See Toolbox for more about statistical modelling.
-The underlying framework is the same for all cases listed in this section. -[ Description of general framework FIXME ]
-For models that does not include random effects the calculations can be simplified greatly.
-This example shows how standardized residuals can be calculated within the template code and reported back to R using the REPORT
function in TMB.
// linear regression with reporting of residuals
-#include <TMB.hpp>
-template<class Type>
-operator() ()
- Type objective_function<Type>::
- {DATA_VECTOR(Y);
- DATA_VECTOR(x);
- PARAMETER(a);
- PARAMETER(b);
- PARAMETER(logSigma);
-
- Type sigma = exp(logSigma);x;
- Vector<Type> pred = a + b*sum(dnorm(Y, a+b*x, sigma, true));
- Type nll = -
- Vector<Type> residuals = (Y - pred)/sigma; REPORT(residuals);
- return nll;
- }
Assuming that the model parameters have been fitted, and the model object is called obj
, the standardized residuals can now be extracted from the model object usinig the report()
function and inspected for normality as follows:
- ... <- obj$report()
- rep qqnorm(rep$residuals)
-abline(0,1)
We now consider situations where the error distribution is continuous but not Gaussian.
-Residuals that are standard normal distributed given that the model is correct, can be obtained be using the “transformation trick,” here illustrated using a model that fits a gamma distribution.
For discrete probability distributions the transformation trick can also be used, but an element of randomization must be added in order to obtain residuals that are truly Gaussian.
-Assume that you have a series of observed counts y
and you have fitted some TMB model using a Poisson likelihood, and the predicted values from that model have been reported and saved in a vector called mu
.
Model validation using residuals is considerably more complicated for random effect models. -Further information can be found in (Thygesen et al. 2017) FIXME: not generating reference.
-Other names are one step prediction errors, forecast pseudo-residuals, and recursive residuals.
-These residuals can be computed using the oneStepPredict
function.
-There are several methods available within this function, and it is the responsibility of the user to ensure that an appropriate method is chosen for a given model.
The following examples of its use are availabe in the tmb_examples/validation
folder.
Example | -Description | -
---|---|
validation/MVRandomWalkValidation.cpp | -Estimate and validate a multivariate random walk model with correlated increments and correlated observations. | -
validation/randomwalkvalidation.cpp | -Estimate and validate a random walk model with and without drift | -
validation/rickervalidation.cpp | -Estimate and validate a Ricker model based on data simulated from the logistic map | -
An alternative (and faster) method is based on a single sample of the random effects from the their posterior distribution given the data. -For state space models we can derive both process- and observation errors from the single sample and the observations, and compare these with the assumptions in the model.
-An example can be found at the end of the randomwalkvalidation.R
file in the tmb_examples/validation
folder