diff --git a/doc/Constraints.md b/doc/Constraints.md
index 77bbdbec3..74662726c 100644
--- a/doc/Constraints.md
+++ b/doc/Constraints.md
@@ -15,7 +15,9 @@ documentation is highly recommended.
The nonlinear least-squares system used by the Ceres Solver is written as:
-![least squares equation](constraints-least_squares_equation.gif)
+```math
+\mathrm{arg\ min}_{x} \left(\frac{1}{2} \sum_i \rho_i \left(||f(x_{i_1},...,x_{i_k})||^2\right)\right)
+```
In Ceres Solver parlance, ρ() is called a "loss function". f() is called a "cost function", which accepts one or
more inputs, x. And the inputs, x, are called "parameter blocks". The "parameter blocks" themselves may be
@@ -43,7 +45,10 @@ The "cost function" is the main component of the Constraint object. It is respon
minimized by the Ceres Solver optimizer. The cost function must implement some sort of equation to generate a score
for arbitrary input values. In its most generic form, that equation is written simply as:
-![generic cost equation](constraints-generic_cost_equation.gif)
+```math
+\begin{bmatrix} r_1 \\ ... \\ r_i\end{bmatrix} = f\left(\begin{bmatrix}x_{1_1} \\ ... \\ x_{1_j}\end{bmatrix}, ..., \begin{bmatrix}x_{n_1} \\ ... \\ x_{n_k}\end{bmatrix}\right)
+```
+
where f() is the cost function, x1 through xn are the input Variables, each of which may contain
multi-dimensional data, and ri are one or more dimensions of the computed costs. In Ceres Solver notation,
@@ -58,7 +63,9 @@ An observation model, sometimes called a sensor model, predicts a sensor measure
of the system Variables. The cost is then computed as the difference between the predicted sensor measurement and the
actual sensor measurement, normalized by the measurement uncertainty.
-![observation model equation](constraints-observation_model_equation.gif)
+```math
+\begin{bmatrix} r_1 \\ ... \\ r_i\end{bmatrix} = \left(\begin{bmatrix}z_1 \\ ... \\ z_i\end{bmatrix} - h\left(\begin{bmatrix}x_{1_1} \\ ... \\ x_{1_j}\end{bmatrix}, ..., \begin{bmatrix}x_{n_1} \\ ... \\ x_{n_k}\end{bmatrix}\right)\right) \cdot \Sigma ^{-\frac{1}{2}}
+```
where z is the sensor measured, h() is the sensor prediction function, and Σ is the covariance matrix. Within
the least-squares minimization, the entire cost function will get squared. By dividing by the square root of the
@@ -72,7 +79,9 @@ A state transition model, sometimes called a motion model, predicts the value of
current estimates of the system Variables. This is generally used to enforce a physical model of the system, such
as known vehicle kinematics.
-![state transition model equation](constraints-state_transition_model_equation.gif)
+```math
+\begin{bmatrix} r_1 \\ ... \\ r_i\end{bmatrix} = \left(\begin{bmatrix}x_{t_1} \\ ... \\ x_{t_i}\end{bmatrix} - f\left(\begin{bmatrix}x_{{t-1}_1} \\ ... \\ x_{{t-1}_i}\end{bmatrix}\right)\right) \cdot \Sigma ^{-\frac{1}{2}}
+```
where xt is the current Variable estimate for time _t_, xt-1 is the current Variable estimate
for time _t-1_, f() is the state prediction function that implements the desired kinematic or dynamic model
@@ -245,7 +254,12 @@ modeled this way. Our cost function will follow the "observation model", where t
predict the sensor measurement, and the cost will be the different between the measured and the prediction normalized
by the measurement uncertainty.
-![2D pose prior cost equation](constraints-pose_2d_prior_equation.gif)
+```math
+\begin{bmatrix} \mathrm{cost}_1 \\ \mathrm{cost}_2 \\ \mathrm{cost}_3\end{bmatrix}
+= \left(\begin{bmatrix}z_x \\ z_y \\ z_{yaw}\end{bmatrix}
+- \begin{bmatrix}position_x \\ position_y \\ orientation_{yaw}\end{bmatrix}\right)
+\cdot \Sigma ^{-\frac{1}{2}}
+```
We will make use of Ceres Solver's automatic derivative system to compute the Jacobians. For that to work, we must
implement the cost function equation as a functor object (has an `operator()` method). To compute the cost, our
diff --git a/doc/Variables.md b/doc/Variables.md
index 6a9a09219..8e518a1f0 100644
--- a/doc/Variables.md
+++ b/doc/Variables.md
@@ -47,8 +47,6 @@ places where most of the dimensions are unused. However, including too few physi
also leads to inefficient and cumbersome usage when even the simplest of observation models involve many variables.
This is one of those "Goldilocks principle" situations.
-![Goldilocks principle](http://home.netcom.com/~swansont_2/goldilocks.jpg)
-
Understanding how Variable interact with the rest of the system will help in the design of "good" Variable.
* The fuse stack is designed to combine observations _of the same variable identity_ from multiple sources. As
diff --git a/doc/constraints-generic_cost_equation.gif b/doc/constraints-generic_cost_equation.gif
deleted file mode 100644
index 9fc2ac524..000000000
Binary files a/doc/constraints-generic_cost_equation.gif and /dev/null differ
diff --git a/doc/constraints-least_squares_equation.gif b/doc/constraints-least_squares_equation.gif
deleted file mode 100644
index f1b4663f2..000000000
Binary files a/doc/constraints-least_squares_equation.gif and /dev/null differ
diff --git a/doc/constraints-observation_model_equation.gif b/doc/constraints-observation_model_equation.gif
deleted file mode 100644
index 25a37d312..000000000
Binary files a/doc/constraints-observation_model_equation.gif and /dev/null differ
diff --git a/doc/constraints-pose_2d_prior_equation.gif b/doc/constraints-pose_2d_prior_equation.gif
deleted file mode 100644
index 54e432c27..000000000
Binary files a/doc/constraints-pose_2d_prior_equation.gif and /dev/null differ
diff --git a/doc/constraints-state_transition_model_equation.gif b/doc/constraints-state_transition_model_equation.gif
deleted file mode 100644
index f8dd22c0d..000000000
Binary files a/doc/constraints-state_transition_model_equation.gif and /dev/null differ