diff --git a/faqs.md b/faqs.md index 17881e9..6fea61c 100644 --- a/faqs.md +++ b/faqs.md @@ -102,7 +102,7 @@ If they do not, the dot product between the rows of $$A$$ and the columns of $$B ### What's the relationship between spans, projections, and multiple linear regression? -### Spans +#### Spans The **span** of a set of vectors $$\{x_1, x_2, \ldots, x_p\}$$ is the set of all possible linear combinations of these vectors. In other words, the span defines a subspace in $$\mathbb{R}^n$$ that contains all possible combinations of the independent variables. @@ -112,29 +112,29 @@ $$ In the context of multiple linear regression, the span of the feature vectors represents all possible values that can be predicted using a linear combination of the feature vectors. -### Projections +#### Projections -A **projection** of the observation vector $$y$$ onto the span of the feature vectors $$\{x_1, x_2, \ldots, x_p\}$$ is any vector $$\hat{y}$$ that lies in the span of $$x$$: +A **projection** of the observation vector $$\vec{y$}$ onto the span of the feature vectors $$\{\vec{x}_1, \vec{x}_2, \ldots, \vec{x}_p\}$$ is any vector $$\hat{y}$$ that lies in this span. -The distance between the observations and the projection of $$y$$ into the span of the feature vectors represents the error of a prediction. That is, each projection of $$y$$ into the span of the feature vectors is defined by scaling each of the feature vectors by a certain amount ($$w_1$$, $$w_2$$, etc.) and summing them; the distance from this linear combination of the feature vectors to the actual observed values of $$y$$ is the error of a certain prediction. +The distance between the observations and the projection of $$\vec{y}$$ into the span of the feature vectors represents the error of a prediction. That is, each projection of $$\vec{y}$$ into the span of the feature vectors is defined by scaling each of the feature vectors by a certain amount ($$w_1$$, $$w_2$$, etc.) and summing them; the distance from this linear combination of the feature vectors to the actual observed values of $$\vec{y}$$ is the error of a certain prediction. This error is written as $$ -\vec{e} = y - X\vec{w} +\vec{e} = \vec{y} - X\vec{w} $$, -where $$X$$ represents the design matrix made up of the feature vectors, and $$\vec{w}$$ represents the coefficients that you are scaling the feature vectors by to obtain some projection of $$y$$ into the span of $$X$$. +where $$X$$ represents the design matrix made up of the feature vectors, and $$\vec{w}$$ represents the coefficients that you are scaling the feature vectors by to obtain some projection of $$\vec{y}$$ into the span of $$X$$. -The **orthogonal projection** of $$y$$ into $$X$$ is the one that minimizes the error vector (Or the distance between the predicted values of $$y$$ and the actual values of $$y$$). +The **orthogonal projection** of $$\vec{y}$$ into $$X$$ is the one that minimizes the error vector (Or the distance between the predicted values of $$\vec{y}$$ and the actual values of $$\vec{y}$$). -### Multiple Linear Regression +#### Multiple Linear Regression -Tying this all together, one can frame multiple linear regression as a projection problem; Given some set of feature vectors $$\vec{x}_1, \vec{x}_2, ... , \vec{x}_n$$, and an observation vector $$\vec{y}$$, what are the scalars $$ w_1, w_2, ... , w_n $$ that give a vector in the span of the feature vectors that is the closest to $$\vec{y}$$? +Tying this all together, one can frame multiple linear regression as a projection problem; Given some set of feature vectors $$\vec{x}_1, \vec{x}_2, ... , \vec{x}_p$$, and an observation vector $$\vec{y}$$, what are the scalars $$ w_1, w_2, ... , w_p $$ that give a vector in the span of the feature vectors that is the closest to $$\vec{y}$$? In other words, how close can we get to the observed values of $$\vec{y}$$, while in the span of our feature vectors? -This framing of multiple linear regression also leads us to the **Normal Equations** +This framing of multiple linear regression also leads us to the **normal equations** $$ w = (X^\mathrm{T}X)^{-1}X^\mathrm{T}y.