-
Notifications
You must be signed in to change notification settings - Fork 0
/
02-build-better-training-data-solution.qmd
132 lines (96 loc) · 2.51 KB
/
02-build-better-training-data-solution.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
title: "02-build-better-training-data"
format: html
---
```{r setup, include=FALSE}
options(scipen = 999)
library(tidyverse)
library(tidymodels)
# update rcis if the package is not up-to-date - only way to access bechdel
# remotes::install_github("cis-ds/rcis")
library(rcis)
bechdel <- bechdel %>%
# remove column - unique to each row, non-informative
select(-title)
# data splitting
set.seed(100) # Important!
bechdel_split <- initial_split(bechdel, strata = test, prop = .9)
bechdel_train <- training(bechdel_split)
bechdel_test <- testing(bechdel_split)
# data resampling
set.seed(100)
bechdel_folds <- vfold_cv(bechdel_train, v = 10, strata = test)
# KNN model
knn_mod <-nearest_neighbor() %>%
set_engine("kknn") %>%
set_mode("classification")
```
# Your Turn 1
Unscramble! You have all the steps from our `knn_rec`- your challenge is to *unscramble* them into the right order!
Save the result as `knn_rec`
```{r}
step_dummy(all_nominal_predictors())
step_normalize(all_numeric_predictors())
recipe(test ~ ., data = bechdel)
step_novel(all_nominal_predictors())
step_zv(all_predictors())
step_other(genre, threshold = .05)
```
Answer:
```{r}
knn_rec <- recipe(test ~ ., data = bechdel) %>%
step_other(genre, threshold = .05) %>%
step_novel(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric_predictors())
knn_rec
```
# Your Turn 2
Fill in the blanks to make a workflow that combines `knn_rec` and with `knn_mod`.
```{r}
knn_wf <- ______ %>%
______(knn_rec) %>%
______(knn_mod)
knn_wf
```
Answer:
```{r}
knn_wf <- workflow() %>%
add_recipe(knn_rec) %>%
add_model(knn_mod)
knn_wf
```
# Your Turn 3
Edit the code chunk below to fit the entire `knn_wflow` instead of just `knn_mod`.
```{r}
set.seed(100)
knn_mod %>%
fit_resamples(test ~ .,
resamples = bechdel_folds) %>%
collect_metrics()
```
Answer:
```{r}
set.seed(100)
knn_wf %>%
fit_resamples(resamples = bechdel_folds) %>%
collect_metrics()
```
# Your Turn 4
Turns out, the same `knn_rec` recipe can also be used to fit a penalized logistic regression model using the lasso. Let's try it out!
```{r}
plr_mod <- logistic_reg(penalty = .01, mixture = 1) %>%
set_engine("glmnet") %>%
set_mode("classification")
plr_mod %>%
translate()
```
Answer:
```{r}
glmnet_wf <- knn_wf %>%
update_model(plr_mod)
glmnet_wf %>%
fit_resamples(resamples = bechdel_folds) %>%
collect_metrics()
```