-
Notifications
You must be signed in to change notification settings - Fork 0
/
stats216v_hw3_p4.Rmd
141 lines (108 loc) · 2.43 KB
/
stats216v_hw3_p4.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
output:
pdf_document: default
html_notebook: default
---
## Problem 4
### a)
```{r}
# truncated power base function
# when added to model for a cubic polynomial this
# keeps f continuous for 1st and 2nd derivatives at each knot
# x is predictor
# z is knot
h <- function(x,z) {
ifelse(x>z,(x-z)^3,0)
}
```
### b)
```{r}
#
# xs is vector of predictors
# h is a matrix of truncated power base functions
hs <- function(xs,z) {
sapply(xs,h, z=z)
}
```
### c)
```{r}
#
# xs is predictors
# zs is knots
splinebasis <- function(xs,zs) {
mhs <- matrix(data=0,nrow=length(xs),ncol=length(zs))
for(j in 1:length(zs)){
mhs[,j] = sapply( xs, hs, zs[j])
}
m <- cbind(xs, xs^2, xs^3, mhs)
}
```
### d)
```{r}
set.seed(2019)
x = runif(100)
y = sin(10*x)
```
### e)
```{r}
knots=c(1/4,2/4,3/4)
out3 = lm(y~I(splinebasis(x,knots)))
print(out3)
yhat <- predict(out3)
reorder <- order(x)
plot(x,y)
lines(x[reorder],y[reorder])
points(x,yhat, col='green')
lines(x[reorder],yhat[reorder], col='green')
```
The spline (in green) does an OK job of approximating the function. The green line is fairly close to the black line but the spline is a bit low both on the left maximum and the central minimum.
### f)
```{r}
k1=c(1/2)
k2=c(1/3,2/3)
k3=c(1/4,2/4,3/4)
k4=c(1/5,2/5,3/5,4/5)
k5=c(1/6,2/6,3/6,4/6,5/6)
k6=c(1/7,2/7,3/7,4/7,5/7,6/7)
k7=c(1/8,2/8,3/8,4/8,5/8,6/8,7/8)
k8=c(1/9,2/9,3/9,4/9,5/9,6/9,7/9,8/9)
k9=c(1/10,2/10,3/10,4/10,5/10,6/10,7/10,8/10,9/10)
runner <- function (k) {
m <- lm(y~I(splinebasis(x,k)))
yhat <- predict(m)
reorder <- order(x)
plot(x,y)
lines(x[reorder],y[reorder])
points(x,yhat, col='green',pch=5)
lines(x[reorder],yhat[reorder], col='green',lty=2)
}
```
```{r}
o1 <- runner(k1)
```
```{r}
o2 <- runner(k2)
```
```{r}
o3 <- runner(k3)
```
```{r}
o4 <- runner(k4)
```
```{r}
o5 <- runner(k5)
```
```{r}
o6 <- runner(k6)
```
```{r}
o7 <- runner(k7)
```
```{r}
o8 <- runner(k8)
```
```{r}
o9 <- runner(k9)
```
### g)
Yes, as the number of knots increases we expect the curve to become a better approximation of the true curve because it is increasingly flexible and better able to match the true data. Improvement will slow and then stop as the number DoF in the model approach the number of unique x values. However, in the real world when the true data is unknown, increasing the knots would likely lear to overfiting. Picking a good k could be done by using a validation or cross-validation approach.