02-build-better-training-data-solution.qmd

---
title: "02-build-better-training-data"
format: html
---

```{r setup, include=FALSE}
options(scipen = 999)
library(tidyverse)
library(tidymodels)

# update rcis if the package is not up-to-date - only way to access bechdel
# remotes::install_github("cis-ds/rcis")
library(rcis)

bechdel <- bechdel %>%
  # remove column - unique to each row, non-informative
  select(-title)

# data splitting
set.seed(100) # Important!
bechdel_split  <- initial_split(bechdel, strata = test, prop = .9)
bechdel_train  <- training(bechdel_split)
bechdel_test   <- testing(bechdel_split)

# data resampling
set.seed(100)
bechdel_folds <- vfold_cv(bechdel_train, v = 10, strata = test)

# KNN model
knn_mod <-nearest_neighbor() %>%              
  set_engine("kknn") %>%             
  set_mode("classification") 
```

# Your Turn 1

Unscramble! You have all the steps from our `knn_rec`- your challenge is to *unscramble* them into the right order! 

Save the result as `knn_rec`

```{r}
step_dummy(all_nominal_predictors())

step_normalize(all_numeric_predictors()) 

recipe(test ~ ., data = bechdel)

step_novel(all_nominal_predictors())

step_zv(all_predictors()) 

step_other(genre, threshold = .05) 
```

Answer:

```{r}
knn_rec <- recipe(test ~ ., data = bechdel) %>% 
  step_other(genre, threshold = .05) %>% 
  step_novel(all_nominal_predictors()) %>% 
  step_dummy(all_nominal_predictors()) %>% 
  step_zv(all_predictors()) %>% 
  step_normalize(all_numeric_predictors()) 
knn_rec
```

# Your Turn 2

Fill in the blanks to make a workflow that combines `knn_rec` and with `knn_mod`.

```{r}
knn_wf <- ______ %>% 
  ______(knn_rec) %>% 
  ______(knn_mod)
knn_wf
```

Answer:

```{r}
knn_wf <- workflow() %>% 
  add_recipe(knn_rec) %>% 
  add_model(knn_mod)
knn_wf
```

# Your Turn 3

Edit the code chunk below to fit the entire `knn_wflow` instead of just `knn_mod`.

```{r}
set.seed(100)
knn_mod %>% 
  fit_resamples(test ~ ., 
                resamples = bechdel_folds) %>% 
  collect_metrics()
```

Answer:

```{r}
set.seed(100)
knn_wf %>% 
  fit_resamples(resamples = bechdel_folds) %>% 
  collect_metrics()
```

# Your Turn 4

Turns out, the same `knn_rec` recipe can also be used to fit a penalized logistic regression model using the lasso. Let's try it out!

```{r}
plr_mod <- logistic_reg(penalty = .01, mixture = 1) %>% 
  set_engine("glmnet") %>% 
  set_mode("classification")

plr_mod %>% 
  translate()
```

Answer:

```{r}
glmnet_wf <- knn_wf %>% 
  update_model(plr_mod)

glmnet_wf %>% 
  fit_resamples(resamples = bechdel_folds) %>% 
  collect_metrics()
```