07-ch7.Rmd

# Hypothesis Tests and Confidence Intervals in Multiple Regression {#htaciimr}

This chapter discusses methods that allow to quantify the sampling uncertainty in the OLS estimator of the coefficients in multiple regression models. The basis for this are  hypothesis tests and confidence intervals which, just as for the simple linear regression model, can be computed using basic `r ttcode("R")` functions. We will also tackle the issue of testing joint hypotheses on these coefficients. 

Make sure the packages `r ttcode("AER")` [@R-AER] and `r ttcode("stargazer")` [@R-stargazer] are installed before you go ahead and replicate the examples. The safest way to do so is by checking whether the following code chunk executes without any issues.

```{r, warning=FALSE, message=FALSE, eval=FALSE}
library(AER)
library(stargazer)
```

## Hypothesis Tests and Confidence Intervals for a Single Coefficient

We first discuss how to compute standard errors, how to test hypotheses and how to construct confidence intervals for a single regression coefficient $\beta_j$ in a multiple regression model. The basic idea is summarized in Key Concept 7.1.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC7.1">
<h3 class = "right"> Key Concept 7.1 </h3>          
<h3 class = "left"> Testing the Hypothesis $\\beta_j = \\beta_{j,0}$ <br>
                    Against the Alternative $\\beta_j \\neq \\beta_{j,0}$ </h3>
<p>
1. Compute the standard error of $\\hat{\\beta_j}$
2. Compute the $t$-statistic,
$$t^{act} = \\frac{\\hat{\\beta}_j - \\beta_{j,0}} {SE(\\hat{\\beta_j})}$$
3. Compute the $p$-value,
$$p\\text{-value} = 2 \\Phi(-|t^{act}|)$$

where $t^{act}$ is the value of the $t$-statistic actually computed. Reject the hypothesis at the $5\\%$ significance level if the $p$-value is less than $0.05$ or, equivalently, if $|t^{act}| > 1.96$. The standard error and (typically) the $t$-statistic and the corresponding $p$-value for testing $\\beta_j = 0$ are computed automatically by suitable <tt>R</tt> functions, e.g., by <tt>summary</tt>.
</p>
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Testing the Hypothesis $\\beta_j = \\beta_{j,0}$
                    Against the Alternative $\\beta_j \\neq \\beta_{j,0}$]{7.1}
\\begin{enumerate}
\\item Compute the standard error of $\\hat{\\beta_j}$
\\item Compute the $t$-statistic,
$$t^{act} = \\frac{\\hat{\\beta}_j - \\beta_{j,0}} {SE(\\hat{\\beta_j})}$$
\\item Compute the $p$-value,
$$p\\text{-value} = 2 \\Phi(-|t^{act}|)$$
where $t^{act}$ is the value of the $t$-statistic actually computed. Reject the hypothesis at the $5\\%$ significance level if the $p$-value is less than $0.05$ or, equivalently, if $|t^{act}| > 1.96$. \\end{enumerate}\\vspace{0.5cm}

The standard error and (typically) the $t$-statistic and the corresponding $p$-value for testing $\\beta_j = 0$ are computed automatically by suitable \\texttt{R} functions, e.g., by \\texttt{summary()}.
\\end{keyconcepts}
')
```

Testing a single hypothesis about the significance of a coefficient in the multiple regression model proceeds as in in the simple regression model. 

You can easily see this by inspecting the coefficient summary of the regression model

$$ TestScore = \beta_0 + \beta_1 \times size  \beta_2 \times english + u $$

already discussed in Chapter \@ref(rmwmr). Let us review this:

```{r, echo=5:7, warning=F, message=F}
library(AER)
data(CASchools)
CASchools$size <- CASchools$students/CASchools$teachers
CASchools$score <- (CASchools$read + CASchools$math)/2

model <- lm(score ~ size + english, data = CASchools)
coeftest(model, vcov. = vcovHC, type = "HC1")
```

You may check that these quantities are computed as in the simple regression model by computing the $t$-statistics or $p$-values by hand using the output above and `r ttcode("R")` as a calculator.

For example, using the definition of the $p$-value for a two-sided test as given in Key Concept 7.1, we can confirm the $p$-value for a test of the hypothesis that the coefficient $\beta_1$, the coefficient on `r ttcode("size")`, to be approximately zero.

```{r, warning=F, message=F}
# compute two-sided p-value
2 * (1 - pt(abs(coeftest(model, vcov. = vcovHC, type = "HC1")[2, 3]),
            df = model$df.residual))
```

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC7.2">
<h3 class = "right"> Key Concept 7.2 </h3>          
<h3 class = "left"> Confidence Intervals for a Single Coefficient in Multiple Regression </h3>
<p>
A $95\\%$ two-sided confidence interval for the coefficient $\\beta_j$ is an interval that contains the true value of $\\beta_j$ with a $95 \\%$ probability; that is, it contains the true value of $\\beta_j$ in $95 \\%$ of all repeated samples. Equivalently, it is the set of values of $\\beta_j$ that cannot be rejected by a $5 \\%$ two-sided hypothesis test. When the sample size is large, the $95 \\%$ confidence interval for $\\beta_j$ is
$$\\left[\\hat{\\beta_j}- 1.96 \\times SE(\\hat{\\beta}_j), \\hat{\\beta_j} + 1.96 \\times SE(\\hat{\\beta_j})\\right].$$
</p>
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Confidence Intervals for a Single Coefficient in Multiple Regression]{7.2}
A $95\\%$ two-sided confidence interval for the coefficient $\\beta_j$ is an interval that contains the true value of $\\beta_j$ with a $95 \\%$ probability; that is, it contains the true value of $\\beta_j$ in $95 \\%$ of repeated samples. Equivalently, it is the set of values of $\\beta_j$ that cannot be rejected by a $5 \\%$ two-sided hypothesis test. When the sample size is large, the $95 \\%$ confidence interval for $\\beta_j$ is
$$\\left[\\hat{\\beta_j}- 1.96 \\times SE(\\hat{\\beta}_j), \\hat{\\beta_j} + 1.96 \\times SE(\\hat{\\beta_j})\\right].$$
\\end{keyconcepts}
')
```

## An Application to Test Scores and the Student-Teacher Ratio

Let us take a look at the regression from Section \@ref(mofimr) again.

Computing confidence intervals for individual coefficients in the multiple regression model proceeds as in the simple regression model using the function `r ttcode("confint()")`.

```{r, warning=F, message=F}
model <- lm(score ~ size + english, data = CASchools)
confint(model)
```

To obtain confidence intervals at another level, say $90\%$, just set the argument `r ttcode("level")` in our call of `r ttcode("confint()")` accordingly.

```{r, warning=F, message=F}
confint(model, level = 0.9)
```

The output now reports the desired $90\%$ confidence intervals for all coefficients.

A disadvantage of `r ttcode("confint()")` is that is does not use robust standard errors to compute the confidence interval. For large-sample confidence intervals, this is quickly done manually as follows.

```{r, warning=F, message=F}
# compute robust standard errors
rob_se <- diag(vcovHC(model, type = "HC1"))^0.5

# compute robust 95% confidence intervals
rbind("lower" = coef(model) - qnorm(0.975) * rob_se,
      "upper" = coef(model) + qnorm(0.975) * rob_se)

# compute robust 90% confidence intervals

rbind("lower" = coef(model) - qnorm(0.95) * rob_se,
      "upper" = coef(model) + qnorm(0.95) * rob_se)
```

Knowing how to use `r ttcode("R")` to make inference about the coefficients in multiple regression models, you can now answer the following question: 

Can the null hypothesis that a change in the student-teacher ratio, `r ttcode("size")`, has no significant influence on test scores, `r ttcode("scores")`, --- if we control for the percentage of students learning English in the district, `r ttcode("english")`, --- be rejected at the $10\%$ and the $5\%$ level of significance?
 
The output above shows that zero is not an element of the confidence interval for the coefficient on `r ttcode("size")` such that we can reject the null hypothesis at significance levels of $5\%$ and $10\%$. The same conclusion can be made via the $p$-value for `r ttcode("size")`: $0.00398 < 0.05 = \alpha$.

Note that rejection at the $5\%$-level implies rejection at the $10\%$ level (why?). 

Recall from Chapter \@ref(cifrc) the $95\%$ confidence interval computed above *does not* tell us that a one-unit decrease in the student-teacher ratio has an effect on test scores that lies in the interval with a lower bound of $-1.9497$ and an upper bound of $-0.2529$. Once a confidence interval has been computed, a probabilistic statement like this is wrong: either the interval contains the true parameter or it does not. We do not know which is true.

### Another Augmentation of the Model {-}

What is the average effect on test scores of reducing the student-teacher ratio when the expenditures per pupil and the percentage of english learning pupils are held constant?

Let us augment our model by an additional regressor that is a measure for expenditure per pupil. Using `r ttcode("?CASchools")` we find that `r ttcode("CASchools")` contains the variable `r ttcode("expenditure")`, which provides expenditure per student.

Our model now is $$ TestScore = \beta_0 + \beta_1 \times size + \beta_2 \times english + \beta_3 \times expenditure + u $$

with $expenditure$ the total amount of expenditure per pupil in the district (thousands of dollars).

Let us now estimate the model:

```{r, echo=6:11, warning=F, message=F}
library(AER)
data(CASchools)
CASchools$size <- CASchools$students/CASchools$teachers
CASchools$score <- (CASchools$read + CASchools$math)/2

# scale expenditure to thousands of dollars
CASchools$expenditure <- CASchools$expenditure/1000

# estimate the model
model <- lm(score ~ size + english + expenditure, data = CASchools)
coeftest(model, vcov. = vcovHC, type = "HC1")
```

The estimated effect of a one unit change in the student-teacher ratio on test scores with expenditure and the share of english learning pupils held constant is $-0.29$, which is rather small. What is more, the coefficient on $size$ is not significantly different from zero anymore even at $10\%$ since $p\text{-value}=0.55$. Can you come up with an interpretation for these findings (see Chapter 7.1 of the book)? The insignificance of $\hat\beta_1$ could be due to a larger standard error of $\hat{\beta}_1$ resulting from adding $expenditure$ to the model so that we estimate the coefficient on $size$ less precisely. This illustrates the issue of strongly correlated regressors (imperfect multicollinearity). The correlation between $size$ and $expenditure$ can be computed using `r ttcode("cor()")`.

```{r}
# compute the sample correlation between 'size' and 'expenditure'
cor(CASchools$size, CASchools$expenditure)
```

Altogether, we conclude that the new model provides no evidence that changing the student-teacher ratio, e.g., by hiring new teachers, has any effect on the test scores while keeping expenditures per student and the share of English learners constant.

## Joint Hypothesis Testing Using the F-Statistic

The estimated model is

$$ \widehat{TestScore} = \underset{(15.21)}{649.58} -\underset{(0.48)}{0.29} \times size - \underset{(0.04)}{0.66} \times english + \underset{(1.41)}{3.87} \times expenditure. $$

Now, can we reject the hypothesis that the coefficient on $size$ *and* the coefficient on $expenditure$ are zero? To answer this, we have to resort to joint hypothesis tests. A joint hypothesis imposes restrictions on multiple regression coefficients. This is different from conducting individual $t$-tests where a restriction is imposed on a single coefficient. Chapter 7.2 of the book explains why testing hypotheses about the model coefficients one at a time is different from testing them jointly.

The homoskedasticity-only $F$-Statistic is given by

$$ F = \frac{(SSR_{\text{restricted}} - SSR_{\text{unrestricted}})/q}{SSR_{\text{unrestricted}} / (n-k-1)} $$

with $SSR_{restricted}$ being the sum of squared residuals from the restricted regression, i.e., the regression where we impose the restriction. $SSR_{unrestricted}$ is the sum of squared residuals from the full model, $q$ is the number of restrictions under the null and $k$ is the number of regressors in the unrestricted regression.

It is fairly easy to conduct $F$-tests in `r ttcode("R")`. We can use the function `r ttcode("linearHypothesis()")`contained in the package `r ttcode("car")`.

```{r, warning=F, message=F}
# estimate the multiple regression model
model <- lm(score ~ size + english + expenditure, data = CASchools)

# execute the function on the model object and provide both linear restrictions 
# to be tested as strings
linearHypothesis(model, c("size=0", "expenditure=0"))
```

The output reveals that the $F$-statistic for this joint hypothesis test is about $8.01$ and the corresponding $p$-value is $0.0004$. Thus, we can reject the null hypothesis that both coefficients are zero at any level of significance commonly used in practice.

A heteroskedasticity-robust version of this $F$-test (which leads to the same conclusion) can be conducted as follows.

```{r, warning=F, message=F}
# heteroskedasticity-robust F-test
linearHypothesis(model, c("size=0", "expenditure=0"), white.adjust = "hc1")
```

The standard output of a model summary also reports an $F$-statistic and the corresponding $p$-value. The null hypothesis belonging to this $F$-test is that *all* of the population coefficients in the model except for the intercept are zero, so the hypotheses are $$H_0: \beta_1=0, \ \beta_2 =0, \ \beta_3 =0 \quad \text{vs.} \quad H_1: \beta_j \neq 0 \ \text{for at least one} \ j=1,2,3.$$

This is also called the *overall regression $F$-statistic* and the null hypothesis is obviously different from testing if only $\beta_1$ and $\beta_3$ are zero.

We now check whether the $F$-statistic belonging to the $p$-value listed in the model's summary coincides with the result reported by `r ttcode("linearHypothesis()")`.

```{r, warning=F, message=F}
# execute the function on the model object and provide the restrictions 
# to be tested as a character vector
linearHypothesis(model, c("size=0", "english=0", "expenditure=0"))

# Access the overall F-statistic from the model's summary
summary(model)$fstatistic
```

The entry `r ttcode("value")` is the overall $F$-statistics and it equals the result of `r ttcode("linearHypothesis()")`. The $F$-test rejects the null hypothesis that the model has no power in explaining test scores. It is important to know that the $F$-statistic reported by `r ttcode("summary")` is *not* robust to heteroskedasticity!

## Confidence Sets for Multiple Coefficients 

Based on the $F$-statistic that we have previously encountered, we can specify confidence sets. Confidence sets are analogous to confidence intervals for single coefficients. As such, confidence sets consist of *combinations* of coefficients that contain the true combination of coefficients in, say, $95\%$ of all cases if we could repeatedly draw random samples, just like in the univariate case. Put differently, a confidence set is the set of all coefficient combinations for which we cannot reject the corresponding joint null hypothesis tested using an $F$-test.  

The confidence set for two coefficients an ellipse which is centered around the point defined by both coefficient estimates. Again, there is a very convenient way to plot the confidence set for two coefficients of model objects, namely the function `r ttcode("confidenceEllipse()")` from the `r ttcode("car")` package.

We now plot the $95\%$ confidence ellipse for the coefficients on `r ttcode("size")` and `r ttcode("expenditure")` from the regression conducted above. By specifying the additional argument `r ttcode("fill")`, the confidence set is colored.


```{r, echo=2:3, fig.align = 'center', warning=F, message=F}
model <- lm(score ~ size + english + expenditure, data = CASchools)
# draw the 95% confidence set for coefficients on size and expenditure
confidenceEllipse(model, 
                  fill = T,
                  lwd = 0,
                  which.coef = c("size", "expenditure"),
                  main = "95% Confidence Set")
```


We see that the ellipse is centered around $(-0.29, 3.87)$, the pair of coefficients estimates on $size$ and $expenditure$. What is more, $(0,0)$ is not element of the $95\%$ confidence set so that we can reject $H_0: \beta_1 = 0, \ \beta_3 = 0$.

By default, `r ttcode("confidenceEllipse()")` uses homoskedasticity-only standard errors. The following code chunk shows how compute a robust confidence ellipse and how to overlay it with the previous plot.


```{r, fig.align = 'center', warning=F, message=F}
# draw the robust 95% confidence set for coefficients on size and expenditure 
confidenceEllipse(model, 
                  fill = T,
                  lwd = 0,
                  which.coef = c("size", "expenditure"),
                  main = "95% Confidence Sets",
                  vcov. = vcovHC(model, type = "HC1"),
                  col = "red")
                  
# draw the 95% confidence set for coefficients on size and expenditure
confidenceEllipse(model, 
                  fill = T,
                  lwd = 0,
                  which.coef = c("size", "expenditure"),
                  add = T)
```


As the robust standard errors are slightly larger than those valid under homoskedasticity only in this case, the robust confidence set is slightly larger. This is analogous to the confidence intervals for the individual coefficients.

## Model Specification for Multiple Regression

Choosing a regression specification, i.e., selecting the variables to be included in a regression model, is a difficult task. However, there are some guidelines on how to proceed. The goal is clear: obtaining an unbiased and precise estimate of the causal effect of interest. As a starting point, think about omitted variables, that is, to avoid possible bias by using suitable control variables. Omitted variables bias in the context of multiple regression is explained in Key Concept 7.3. A second step could be to compare different specifications by measures of fit. However, as we shall see one should not rely solely on $\bar{R}^2$.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC7.3">
<h3 class = "right"> Key Concept 7.3 </h3>          
<h3 class = "left"> Omitted Variable Bias in Multiple Regression</h3>
<p>
Omitted variable bias is the bias in the OLS estimator that arises when regressors correlate with an omitted variable. For omitted variable bias to arise, two things must be true: 

1. At least one of the included regressors must be correlated with the omitted variable. 
2. The omitted variable must be a determinant of the dependent variable, $Y$.
</p>
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Omitted Variable Bias in Multiple Regression]{7.3}
Omitted variable bias is the bias in the OLS estimator that arises when regressors correlate with an omitted variable. For omitted variable bias to arise, two things must be true:\\newline

\\begin{enumerate}
\\item At least one of the included regressors must be correlated with the omitted variable. 
\\item The omitted variable must be a determinant of the dependent variable, $Y$.
\\end{enumerate}

\\end{keyconcepts}
')
```

We now discuss an example were we face a potential omitted variable bias in a multiple regression model:

Consider again the estimated regression equation

$$ \widehat{TestScore} = \underset{(8.7)}{686.0} - \underset{(0.43)}{1.10} \times size - \underset{(0.031)}{0.650} \times english. $$

We are interested in estimating the causal effect of class size on test score. There might be a bias due to omitting "outside learning opportunities" from our regression since such a measure could be a determinant of the students' test scores and could also be correlated with both regressors already included in the model (so that both conditions of Key Concept 7.3 are fulfilled). "Outside learning opportunities" are a complicated concept that is difficult to quantify. A surrogate we can consider instead is the students' economic background which likely are strongly related to outside learning opportunities: think of wealthy parents that are able to provide time and/or money for private tuition of their children. We thus augment the model with the variable `r ttcode("lunch")`, the percentage of students that qualify for a free or subsidized lunch in school due to family incomes below a certain threshold, and reestimate the model.   

```{r, echo=6:8, warning=F, message=F}
library(AER)
data(CASchools)
CASchools$size <- CASchools$students/CASchools$teachers
CASchools$score <- (CASchools$read + CASchools$math)/2

# estimate the model and print the summary to console
model <- lm(score ~ size + english + lunch, data = CASchools)
coeftest(model, vcov. = vcovHC, type = "HC1")
```

Thus, the estimated regression line is

$$ \widehat{TestScore} = \underset{(5.56)}{700.15} - \underset{(0.27)}{1.00} \times size - \underset{(0.03)}{0.12} \times english - \underset{(0.02)}{0.55} \times lunch. $$

We observe no substantial changes in the conclusion about the effect of $size$ on $TestScore$: the coefficient on $size$ changes by only $0.1$ and retains its significance. 

Although the difference in estimated coefficients is not big in this case, it is useful to keep `r ttcode("lunch")` to make the assumption of conditional mean independence more credible (see Chapter 7.5 of the book).

### Model Specification in Theory and in Practice {-}

Key Concept 7.4 lists some common pitfalls when using $R^2$ and $\bar{R}^2$ to evaluate the predictive ability of regression models.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC7.4">
<h3 class = "right"> Key Concept 7.4 </h3>          
<h3 class = "left"> $R^2$ and $\\bar{R}^2$: what they tell you --- and what they do not </h3>

<p>

The $R^2$ and $\\bar{R}^2$ tell you whether the regressors are good at explaining the variation of the independent variable in the sample. If the $R^2$ (or $\\bar{R}^2$) is nearly $1$, then the regressors produce a good prediction of the dependent variable in that sample, in the sense that the variance of OLS residuals is small compared to the variance of the dependent variable. If the $R^2$ (or $\\bar{R}^2$) is nearly $0$, the opposite is true.

The $R^2$ and $\\bar{R}^2$ do *not* tell you whether:

1. An included variable is statistically significant. 
2. The regressors are the true cause of the movements in the dependent variable.
3. There is omitted variable bias.
4. You have chosen the most appropriate set of regressors.

</p>
  
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[$R^2$ and $\\bar{R}^2$: What They Tell You --- and What They Do not]{7.4}
The $R^2$ and $\\bar{R}^2$ tell you whether the regressors are good at explaining the variation of the independent variable in the sample. If the $R^2$ (or $\\bar{R}^2$) is nearly $1$, then the regressors produce a good prediction of the dependent variable in that sample, in the sense that the variance of OLS residuals is small compared to the variance of the dependent variable. If the $R^2$ (or $\\bar{R}^2$) is nearly $0$, the opposite is true.\\newline

The $R^2$ and $\\bar{R}^2$ do \\textit{not} tell you whether:\\newline

\\begin{enumerate}
\\item An included variable is statistically significant. 
\\item The regressors are the true cause of the movements in the dependent variable.
\\item There is omitted variable bias.
\\item You have chosen the most appropriate set of regressors.
\\end{enumerate}
\\end{keyconcepts}
')
```

For example, think of regressing $TestScore$ on $PLS$ which measures the available parking lot space in thousand square feet. You are likely to observe a significant coefficient of reasonable magnitude and moderate to high values for $R^2$ and $\bar{R}^2$. The reason for this is that parking lot space is correlated with many determinants of the test score like location, class size, financial endowment and so on. Although we do not have observations on $PLS$, we can use `r ttcode("R")` to generate some relatively realistic data.


```{r}
# set seed for reproducibility
set.seed(1)

# generate observations for parking lot space
CASchools$PLS <- c(22 * CASchools$income 
                   - 15 * CASchools$size 
                   + 0.2 * CASchools$expenditure
                   + rnorm(nrow(CASchools), sd = 80) + 3000)
```


```{r, fig.align='center',}
# plot parking lot space against test score
plot(CASchools$PLS, 
     CASchools$score,
     xlab = "Parking Lot Space",
     ylab = "Test Score",
     pch = 20,
     col = "steelblue")

# regress test score on PLS
summary(lm(score ~ PLS, data = CASchools))
```


$PLS$ is generated as a linear function of $expenditure$, $income$, $size$ and a random disturbance. Therefore the data suggest that there is some positive relationship between parking lot space and test score. In fact, when estimating the model
\begin{align}
TestScore = \beta_0 + \beta_1 \times PLS + u (\#eq:plsmod) 
\end{align}
using `r ttcode("lm()")` we find that the coefficient on $PLS$ is positive and significantly different from zero. Also $R^2$ and $\bar{R}^2$ are about $0.3$ which is a lot more than the roughly $0.05$ observed when regressing the test scores on the class sizes only. This suggests that increasing the parking lot space boosts a school's test scores and that model \@ref(eq:plsmod) does even better in explaining heterogeneity in the dependent variable than a model with $size$ as the only regressor. Keeping in mind how $PLS$ is constructed this comes as no surprise. It is evident that the high $R^2$ <it>cannot</it> be used to the conclude that the estimated relation between parking lot space and test scores is causal: the (relatively) high $R^2$ is due to correlation between $PLS$ and other determinants and/or control variables. Increasing parking lot space is *not* an appropriate measure to generate more learning success!

## Analysis of the Test Score Data Set

Chapter \@ref(rmwmr) and some of the previous sections have stressed that it is important to include control variables in regression models if it is plausible that there are omitted factors. In our example of test scores we want to estimate the causal effect of a change in the student-teacher ratio on test scores. We now provide an example how to use multiple regression in order to alleviate omitted variable bias and demonstrate how to report results using `r ttcode("R")`.

So far we have considered two variables that control for unobservable student characteristics which correlate with the student-teacher ratio *and* are assumed to have an impact on test scores: 

  + $English$, the percentage of English learning students 
  
  + $lunch$, the share of students that qualify for a subsidized or even a free lunch at school
  
Another new variable provided with `r ttcode("CASchools")` is `r ttcode("calworks")`, the percentage of students that qualify for the *CalWorks* income assistance program. Students eligible for *CalWorks* live in families with a total income below the threshold for the subsidized lunch program so both variables are indicators for the share of economically disadvantaged children. Both indicators are highly correlated: 

```{r} 
# estimate the correlation between 'calworks' and 'lunch'
cor(CASchools$calworks, CASchools$lunch)
```

There is no unambiguous way to proceed when deciding which variable to use. In any case it may not a good idea to use both variables as regressors in view of collinearity. Therefore, we also consider alternative model specifications.

For a start, we plot student characteristics against test scores.

```{r}
# set up arrangement of plots
m <- rbind(c(1, 2), c(3, 0))
graphics::layout(mat = m)

# scatterplots
plot(score ~ english, 
     data = CASchools, 
     col = "steelblue", 
     pch = 20, 
     xlim = c(0, 100),
     cex.main = 0.9,
     main = "Percentage of English language learners")

plot(score ~ lunch, 
     data = CASchools, 
     col = "steelblue", 
     pch = 20,
     cex.main = 0.9,
     main = "Percentage qualifying for reduced price lunch")

plot(score ~ calworks, 
     data = CASchools, 
     col = "steelblue", 
     pch = 20, 
     xlim = c(0, 100),
     cex.main = 0.9,
     main = "Percentage qualifying for income assistance")
```

We divide the plotting area up using `r ttcode("layout()")`. The matrix `r ttcode("m")` specifies the location of the plots, see `?layout`.

We see that all relationships are negative. Here are the correlation coefficients.

```{r}
# estimate correlation between student characteristics and test scores
cor(CASchools$score, CASchools$english)
cor(CASchools$score, CASchools$lunch)
cor(CASchools$score, CASchools$calworks)
```

We shall consider five different model equations:

\begin{align*}
  (I) \quad TestScore=& \, \beta_0 + \beta_1 \times size + u, \\
  (II) \quad TestScore=& \, \beta_0 + \beta_1 \times size + \beta_2 \times english + u, \\
  (III) \quad TestScore=& \, \beta_0 + \beta_1 \times size + \beta_2 \times english + \beta_3 \times lunch + u, \\
  (IV) \quad TestScore=& \, \beta_0 + \beta_1 \times size + \beta_2 \times english + \beta_4 \times calworks + u, \\
  (V) \quad TestScore=& \, \beta_0 + \beta_1 \times size + \beta_2 \times english + \beta_3 \times lunch + \beta_4 \times calworks + u
\end{align*}

The best way to communicate regression results is in a table. The `r ttcode("stargazer")` package is very convenient for this purpose. It provides a function that generates professionally looking HTML and LaTeX tables that satisfy scientific standards. One simply has to provide one or multiple object(s) of class `r ttcode("lm")`. The rest is done by the function `r ttcode("stargazer()")`.

```{r, eval=F}
# load the stargazer library
library(stargazer)

# estimate different model specifications
spec1 <- lm(score ~ size, data = CASchools)
spec2 <- lm(score ~ size + english, data = CASchools)
spec3 <- lm(score ~ size + english + lunch, data = CASchools)
spec4 <- lm(score ~ size + english + calworks, data = CASchools)
spec5 <- lm(score ~ size + english + lunch + calworks, data = CASchools)

# gather robust standard errors in a list
rob_se <- list(sqrt(diag(vcovHC(spec1, type = "HC1"))),
               sqrt(diag(vcovHC(spec2, type = "HC1"))),
               sqrt(diag(vcovHC(spec3, type = "HC1"))),
               sqrt(diag(vcovHC(spec4, type = "HC1"))),
               sqrt(diag(vcovHC(spec5, type = "HC1"))))

# generate a LaTeX table using stargazer
stargazer(spec1, spec2, spec3, spec4, spec5,
          se = rob_se,
          digits = 3,
          header = F,
          column.labels = c("(I)", "(II)", "(III)", "(IV)", "(V)"))
```

<!--html_preserve-->

```{r, results='asis', echo=F, cache=T, message=FALSE, warning=FALSE, eval=my_output == "html"}
library(stargazer)
spec1 <- lm(score ~ size, data = CASchools)
spec2 <- lm(score ~ size + english, data = CASchools)
spec3 <- lm(score ~ size + english + lunch, data = CASchools)
spec4 <- lm(score ~ size + english + calworks, data = CASchools)
spec5 <- lm(score ~ size + english + lunch + calworks, data = CASchools)

# gather robust standard errors in a list
rob_se <- list(
  sqrt(diag(vcovHC(spec1, type = "HC1"))),
  sqrt(diag(vcovHC(spec2, type = "HC1"))),
  sqrt(diag(vcovHC(spec3, type = "HC1"))),
  sqrt(diag(vcovHC(spec4, type = "HC1"))),
  sqrt(diag(vcovHC(spec5, type = "HC1")))
)

stargazer(spec1, spec2, spec3, spec4, spec5, 
          type = "html",
          se = rob_se,
          header = F,
          digits = 3,
          float.env = "sidewaystable",
          object.names = TRUE,
          dep.var.caption = "Dependent Variable: Test Score",
          model.numbers = FALSE,
          column.labels = c("(I)", "(II)", "(III)", "(IV)", "(V)")
          )

stargazer_html_title("Regressions of Test Scores on the Student-Teacher Ratio and Control Variables", "rotsostracv")
```

<!--/html_preserve-->

```{r, results='asis', echo=F, cache=T, message=FALSE, warning=FALSE, eval=my_output == "latex"}
library(stargazer)
spec1 <- lm(score ~ size, data = CASchools)
spec2 <- lm(score ~ size + english, data = CASchools)
spec3 <- lm(score ~ size + english + lunch, data = CASchools)
spec4 <- lm(score ~ size + english + calworks, data = CASchools)
spec5 <- lm(score ~ size + english + lunch + calworks, data = CASchools)

# gather robust standard errors in a list
rob_se <- list(
  sqrt(diag(vcovHC(spec1, type = "HC1"))),
  sqrt(diag(vcovHC(spec2, type = "HC1"))),
  sqrt(diag(vcovHC(spec3, type = "HC1"))),
  sqrt(diag(vcovHC(spec4, type = "HC1"))),
  sqrt(diag(vcovHC(spec5, type = "HC1")))
)

stargazer(spec1, spec2, spec3, spec4, spec5, 
          type = "latex",
          title = "\\label{tab:rotsostracv} Regressions of Test Scores on the Student-Teacher Ratio and Control Variables",
          float.env = "sidewaystable",
          column.sep.width = "-10pt",
          se = rob_se,
          header = F,
          digits = 3,
          dep.var.caption = "Dependent Variable: Test Score",
          object.names = TRUE,
          model.numbers = FALSE,
          column.labels = c("(I)", "(II)", "(III)", "(IV)", "(V)")
          )
```

Table \@ref(tab:rotsostracv) states that $score$ is the dependent variable and that we consider five models. We see that the columns of Table \@ref(tab:rotsostracv) contain most of the information provided by `r ttcode("coeftest()")` and `r ttcode("summary()")` for the regression models under consideration: the coefficients estimates equipped with significance codes (the asterisks) and standard errors in parentheses below. Although there are no $t$-statistics, it is straightforward for the reader to compute them simply by dividing a coefficient estimate by the corresponding standard error. The bottom of the table reports summary statistics for each model and a legend. For an in-depth discussion of the tabular presentation of regression results, see Chapter 7.6 of the book.

What can we conclude from the model comparison? 

1. We see that adding control variables roughly halves the coefficient on `r ttcode("size")`. Also, the estimate is not sensitive to the set of control variables used. The conclusion is that decreasing the student-teacher ratio ceteris paribus by one unit leads to an estimated average increase in test scores of about $1$ point.

2. Adding student characteristics as controls increases $R^2$ and $\bar{R}^2$ from $0.049$ (`r ttcode("spec1")`) up to $0.773$ (`r ttcode("spec3")` and `r ttcode("spec5")`), so we can consider these variables as suitable predictors for test scores. Moreover, the estimated coefficients on all control variables are consistent with the impressions gained from Figure 7.2 of the book.

3. We see that the control variables are not statistically significant in all models. For example in `r ttcode("spec5")`, the coefficient on $calworks$ is not significantly different from zero at $5\%$ since $\lvert-0.048/0.059\rvert=0.81 < 1.64$. We also observe that the effect on the estimate (and its standard error) of the coefficient on $size$ of adding $calworks$ to the base specification `r ttcode("spec3")` is negligible. We can therefore consider `r ttcode("calworks")` as a superfluous control variable, given the inclusion of `r ttcode("lunch")` in this model.

## Exercises {#exercises-7}

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 1. Hypothesis Testing in a Multiple Regression Model --- $t$-statistics and $p$-values {-}

Reconsider the <tt>Boston</tt> data set and the following estimated model (homoscedasticity-only standard errors in parentheses) from the previous chapter:

$$\\widehat{medv}_i = \\underset{(0.75)}{32.828} -\\underset{(0.05)}{0.994} \\times lstat_i -\\underset{(0.04)}{0.083} \\times crim_i + \\underset{(0.01)}{0.038} \\times age_i.$$

Just as in the simple linear regression framework we can conduct hypothesis tests about the coefficients in multiple regression models. The most common hypothesis is $H_0:\\beta_j=0$ against the alternative $H_1:\\beta_j\\ne 0$ for some $j$ in $0,1,\\dots,k$.

The packages <tt>AER</tt> and <tt>MASS</tt> have been loaded. The coefficient estimates as well as the corresponding standard errors are available in <tt>coefs</tt> and <tt>SEs</tt>, respectively.

**Instructions:**

Use vector arithmetics to solve the following tasks:

+ Compute $t$-statistics for each coefficient by using the predefined objects <tt>coefs</tt> and <tt>SEs</tt>. Assign them to <tt>tstats</tt>.

+ Compute $p$-values for each coefficient and assign them to <tt>pval</tt>.

+ Check with the help of logical operators whether the hypotheses are rejected at the $1\\%$ significance level.

<iframe src="DCL/ex7_1.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

+ The $t$-statistic for each coefficient is defined as $t=\\frac{\\widehat{\\beta}_j-\\beta_{j,0}}{SE(\\widehat{\\beta}_j)}$.

+ The $p$-value for a two-sided test using is computed as $2\\cdot\\Phi(-|t^{act}|)$ where $t^{act}$ denotes the computed $t$-statistic.

</div>') } else {
  cat("\\begin{center}\\textit{This interactive part of the book is only available in the HTML version.}\\end{center}")
}
```

```{r, echo=F, purl=F, results='asis', eval=my_output == "html"}
cat('
<div  class = "DCexercise">

#### 2. Hypothesis Testing in a Multiple Regression Model - Confidence Intervals {-}

Consider again the estimated model 

$$\\widehat{medv}_i = \\underset{(0.75)}{32.828} -\\underset{(0.05)}{0.994} \\times lstat_i -\\underset{(0.04)}{0.083} \\times crim_i + \\underset{(0.01)}{0.038} \\times age_i.$$

which is available as the object <tt>mod</tt> in your working environment. The packages <tt>AER</tt> and <tt>MASS</tt> have been loaded.

**Instructions:**

+ Construct $99\\%$ confidence intervals for all model coefficients. Use the intervals to decide whether the individual null hypotheses $H_0:\\beta_j=0$, $j=0,1,2,3,4$ are rejected at the $1\\%$ level.

<iframe src="DCL/ex7_2.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hint:**

+ You may use <tt>confint()</tt> to construct confidence intervals. The confidence level can be set via the argument <tt>level</tt>.

</div>')
```

```{r, echo=F, purl=F, results='asis', eval=my_output == "html"}
cat('
<div  class = "DCexercise">

#### 3. Robust Hypothesis Testing in Multiple Regression Models {-}

The <tt>lm</tt> object <tt>mod</tt> from the previous exercises is available in your working environment. The packages <tt>AER</tt> and <tt>MASS</tt> have been loaded.

**Instructions:**

+ Print a coefficient summary that reports heteroscedasticity-robust standard errors.

+ Access entries of the matrix generated by <tt>coeftest()</tt> to check whether the hypotheses are rejected at a 1% significance level. Use logical operators <tt><,></tt>.

<iframe src="DCL/ex7_3.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

+ Using the argument <tt>vcov.</tt> in <tt>coeftest()</tt> forces the function to use robust standard errors.

+ The $p$-values are contained in the fourth column of the output generated by <tt>coeftest()</tt>. Use square brackets to subset the matrix accordingly.

</div>')
```

```{r, echo=F, purl=F, results='asis', eval=my_output == "html"}
cat('
<div  class = "DCexercise">

#### 4. Joint Hypothesis Testing --- $F$-Test I {-}

Sometimes we are interested in testing joint hypotheses which impose restrictions on *multiple* regression coefficients. For example, in the model 

$$medv_i = \\beta_0 + \\beta_1\\times lstat_i + \\beta_2\\times crim_i + \\beta_3\\times age_i + u_i$$

we may test the null $H_0: \\beta_2=\\beta_3$ vs. the alternative $H_1: \\beta_2\\ne\\beta_3$ (which is a joint hypothesis as we impose a restriction on *two* regression coefficients).

The basic idea behind testing such a hypothesis is to conduct two regressions and to compare the outcomes: for one of the regressions we impose the restrictions of formalized by the null (we call this the restricted regression model), whereas for the other regression the restriction is left out (we call this the unrestricted model). From this starting point we construct a test-statistic which, under the null, follows a well known distribution, an $F$ distribution (see the next exercise).

However, in this exercise we start with the initial computations necessary to construct the test statistic.

The packages <tt>AER</tt> and <tt>MASS</tt> have been loaded.

**Instructions:**

+ Estimate the restricted model, that is, the model where the restriction formalized by $H_0: \\beta_2=\\beta_3$ is assumed to be true. Save the model in <tt>model_res</tt>.

+ Compute the $SSR$ of the restricted model and assign it to <tt>RSSR</tt>.

+ Estimate the unrestricted model, that is, the model where the restriction is assumed to be false. Save it in <tt>model_unres</tt>.

+ Compute the $SSR$ of the unrestricted model and assign it to <tt>USSR</tt>.

<iframe src="DCL/ex7_4.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

+ The restricted model can be written as $$medv_i = \\beta_0 + \\beta_1\\times lstat_i + \\beta_2\\times crim_i + \\beta_2\\times age_i + u_i$$ which, after rearranging, can be expressed as $$medv_i = \\beta_0 + \\beta_1\\times lstat_i + \\beta_2\\times(crim_i+age_i) + u_i.$$

+ The $SSR$ is defined as the sum of the squared residuals.

+ Note that the residuals of a regression model are available as <tt>residuals</tt> in the corresponding <tt>lm</tt> object. So you can access them as usual via the <tt>$</tt>-operator.

</div>

<div  class = "DCexercise">')
```

```{r, echo=F, purl=F, results='asis', eval=my_output == "html"}
cat('
#### 5. Joint Hypothesis Testing --- F-Test II {-}

After estimating the models and computing the $SSR$s you now have to compute the test-statistic and conduct the $F$-test. As mentioned in the last exercise, the test-statistic follows an $F$ distribution. More precisely, we deal with the $F_{q,n-k-1}$ distribution where $q$ denotes the number of restrictions under the null and $k$ is the of regressors in the unrestricted model, excluding the intercept.

The packages <tt>AER</tt> and <tt>MASS</tt> have been loaded. Both models (<tt>model_res</tt> and <tt>model_unres</tt>) as well as their SSR (<tt>RSSR</tt> and <tt>USSR</tt>) are available in your working environment.

**Instructions:**

+ Compute the $F$-statistic and assign it to <tt>Fstat</tt>.

+ Compute the $p$-value and assign it to <tt>pval</tt>.

+ Check whether the null is rejected at the $1\\%$ level using logical operators.

+ Verify your result by using <tt>linearHypothesis()</tt> and printing the results. 

<iframe src="DCL/ex7_5.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

+ The $F$-statistic is defined as $\\frac{RSSR-USSR/q}{USSR/(n-k-1)}$.

+ The $p$-value can be computed as $1-F_{q,n-k-1}(F^{act})$ where $F_{q,n-k-1}$ denotes the CDF of the $F$-distribution (<tt>pf()</tt>) with degrees of freedom $q$ and $n-k-1$ and $F^{act}$ the computed $F$-statistic.

+ <tt>linearHypothesis()</tt> expects the unrestricted model as well as the null hypothesis as arguments.

</div>')
```

```{r, echo=F, purl=F, results='asis', eval=my_output == "html"}
cat('
<div  class = "DCexercise">

#### 6. Joint Hypothesis Testing - Confidence Set {-}

As you know from previous chapters constructing a confidence set for a single regression coefficient results in a simple confidence interval on the real line. However if we consider $n$ regression coefficients jointly (as we do in a joint hypothesis testing setting) we move from $\\mathbb{R}$ to $\\mathbb{R}^n$ resulting in a n-dimensional confidence set. For the sake of illustration we then often choose $n=2$, so that we end up with a representable two-dimensional plane.

Recall the estimated model 

$$\\widehat{medv}_i = \\underset{(0.75)}{32.828} -\\underset{(0.05)}{0.994} \\times lstat_i -\\underset{(0.04)}{0.083} \\times crim_i + \\underset{(0.01)}{0.038} \\times age_i.$$

which is available as <tt>mod</tt> in your working environment. Assume you want to test the null $H_0: \\beta_2=\\beta_3=0$ vs. $H_1: \\beta_2\\ne 0$ or $\\beta_3\\ne 0$.

The packages <tt>AER</tt> and <tt>MASS</tt> have been loaded.

**Instructions:**

+ Construct a $99\\%$ confidence set for the coefficients of <tt>crim</tt> and <tt>lstat</tt>, that is a two-dimensional confidence set. Can you reject the null stated above?

+ Verify your visual inspection by conducting a corresponding $F$-test.

<iframe src="DCL/ex7_6.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

+ Use <tt>confidenceEllipse()</tt> to construct a two-dimensional confidence set. Besides the coefficients for which the confidence set shall be constructed (<tt>which.coef</tt>), you have to specify the confidence level (<tt>levels</tt>).

+ As usual you can use <tt>linearHypothesis()</tt> to conduct the $F$-test. Note that there are two restrictions now, hence you have to pass a vector containing both restrictions.

</div>')
```