-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathIC-based-selection.qmd
123 lines (96 loc) · 4.98 KB
/
IC-based-selection.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# Model Selection Using Information Criteria
So far, we have learned to fit models with multiple predictors, both
quantitative and categorical, and to assess whether required conditions
are met for linear regression to be an appropriate model for a dataset.
One missing piece is: If I have an appropriate model with a set of
multiple predictors, how can I choose which predictors are worth
retaining in a "best" model for the data (and which ones have no
relationship, or a weak relationship, with the response, so should be
discarded)?
## Data and Model
Today we will recreate part of the analysis from [*Vertebrate community
composition and diversity declines along a defaunation gradient
radiating from rural villages in
Gabon*](https://doi.org/10.1111/1365-2664.12798), by Sally Koerner and
colleagues. They investigated the relationship between rural villages,
hunting, and wildlife in Gabon. They asked how monkey abundance depends
on distance from villages, village size, and vegetation characteristics.
They shared their data at
[Dryad.org](https://datadryad.org/stash/dataset/doi:10.5061/dryad.vs97g)
and we can read it in and fit a regression model like this:
```{r, gabon-data}
defaun <- read.csv('http://sldr.netlify.com/data/koerner_gabon_defaunation.csv')
```
```{r, bee-models, echo=TRUE}
ape_mod <- lm(RA_Apes ~ Veg_DBH + Veg_Canopy + Veg_Understory +
Veg_Rich + Veg_Stems + Veg_liana +
LandUse + Distance + NumHouseholds, data = defaun)
summary(ape_mod)
as.numeric(logLik(ape_mod))
```
```{r, include=FALSE}
monkeyBIC <- BIC(ape_mod)
```
## Calculations
- Information criteria allow us to **balance the conflicting goals**
of having a model that *fits the data as well as possible* (which
pushes us toward models with more predictors) and *parsimony*
(choosing the simplest model, with the fewest predictors, that works
for the data and research question). The basic idea is that we
**minimize** the quantity
$-(2LogLikelihood - penalty) = -2LogLikelihood + penalty$
- AIC is computed according to $-2LogLikelihood +2k$, where $k$ is the
number of coefficients being estimated (don't forget $\sigma$!)
**Smaller AIC is better.**
- BIC is computed according to $-2LogLikelihood + ln(n)k$, where $n$
is the number of observations (rows) in the dataset and $k$ is the
number of coefficients being estimated. **Smaller BIC is better.**
- Verify that the BIC for this model is
`r round(monkeyBIC, digits=2)`.
## Decisions with ICs
The following rules of thumb (**not** laws, just common rules of thumb)
may help you make decisions with ICs:
- A model with lower IC *by at least 3 units* is notably better.
- If two or more models have ICs *within* 3 IC units of each other,
there is not a lot of difference between them. Here, we usually
choose the model with fewest predictors.
- In some cases, if the research question is to measure the influence
of some particular predictor on the response, but *the IC does not
strongly support including that predictor* in the best model (IC
difference less than 3), you might want to keep it in anyway and
then discuss the situation honestly, for example, "AIC does not
provide strong support for including predictor x in the best model,
but the model including predictor x indicates that as x increases
the response decreases slightly. More research would be needed..."
## All-possible-subsets Selection
The model we just fitted is our *full model*, with all predictors of
potential interest included. How can we use information criteria to
choose the best model from possible models with subsets of the
predictors?
We can use the `dredge()` function from the `MuMIn` package to get and
display ICs for all these models.
Before using dredge, we need to make sure our dataset has no missing
values, and also set the "na.action" input for our model (can be done in
call to `lm(..., na.action = 'na.fail')` also).
```{r, getaicbic}
library(MuMIn)
ape_mod <- ape_mod |> update(na.action = 'na.fail')
ape_dredge <- dredge(ape_mod, rank='BIC')
pander::pander(head(ape_dredge, 7))
```
- What is the best model according to BIC, for this dataset?
## Which IC should I use?
AIC and BIC may give different best models, especially if the dataset is
large. You may want to just choose one to use *a priori* (before making
calculations). You might prefer BIC if you want to err on the
"conservative" side, as it is more likely to select a "smaller" model
with fewer predictors. This is because of its larger penalty.
## Quantities derived from AIC
- $\Delta AIC$ is the AIC for a given model, minus the AIC of the best
one in the dataset. (Same for $\Delta BIC$)
- *Akaike weights* are values (ranging from 0-1) that measure the
weight of evidence suggesting that a model is the best one (given
that there is one best one in the set)
## Important Caution
**Very important**: IC can **ONLY** be compared for models with the same
response variable, and the exact same rows of data.