Chapter_20.Rmd

---
title: "Chapter 20"
author: "Julin Maloof"
date: "2023-08-03"
output: 
  html_document: 
    keep_md: yes
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r}
library(tidyverse)

library(rlang)
```


Evaluation is the developer equivalent of unquotation...it allows the user to evaluate quoted expressions

Two concepts: _quosure_, captures an expression and its environment, and _data masking_, which allows ealuation in the context of a data frame.

quasiquotation, quosures, and data maksing make up _tidy evaluation_

## 20.2 evaluation basics

`eval` takes two arguments, an expression and an enviornment (calling environment is the default)

```{r}
x <- 10
eval(expr(x))
#> [1] 10

y <- 2
eval(expr(x + y))
#> [1] 12
#> 
#
```


```{r}
eval(expr(x + y), env(x = 1000))
#> [1] 1002
```

### 20.2.1 applications (local)

Local allows you to create temporary variables:

```{r, error=TRUE}
# Clean up variables created earlier
rm(x, y)

foo <- local({
  x <- 10
  y <- 200
  x + y
})

foo
#> [1] 210
x
#> Error in eval(expr, envir, enclos): object 'x' not found
y
#> Error in eval(expr, envir, enclos): object 'y' not found
```

capture expression, create new environment with calling environment as parent:

```{r, error=TRUE}
local2 <- function(expr) {
  env <- env(caller_env())
  eval(enexpr(expr), env)
}

foo <- local2({
  x <- 10
  y <- 200
  x + y
})

foo
#> [1] 210
x
#> Error in eval(expr, envir, enclos): object 'x' not found
y
#> Error in eval(expr, envir, enclos): object 'y' not found
```

### 20.2.2 Application: source()

```{r}
source2 <- function(path, env = caller_env()) {
  file <- paste(readLines(path, warn = FALSE), collapse = "\n")
  exprs <- parse_exprs(file)

  res <- NULL
  for (i in seq_along(exprs)) {
    res <- eval(exprs[[i]], env)
  }

  invisible(res)
}
```

### 20.2.3 function()

beware if using this to create functions.  use `new_function()` or set the src_ref attribute to  NULL

### 20.2.4 Exercises

#### 1. Carefully read the documentation for source(). What environment does it use by default? What if you supply local = TRUE? How do you provide a custom environment?

default is global environment; local=TRUE os calling environment.  local=env sets environment to env

#### 2. Predict the results of the following lines of code:

```{r}
eval(expr(eval(expr(eval(expr(2 + 2)))))) # 4
eval(eval(expr(eval(expr(eval(expr(2 + 2))))))) # 4
expr(eval(expr(eval(expr(eval(expr(2 + 2))))))) # eval(expr(eval(expr(eval(expr(2 + 2))))))
```

#### 3. Fill in the function bodies below to re-implement get() using sym() and eval(), and assign() using sym(), expr(), and eval(). Don’t worry about the multiple ways of choosing an environment that get() and assign() support; assume that the user supplies it explicitly.

```{r}
# name is a string

myenv <- env(x=10,y=3)

get2 <- function(name, env) {
  name <- sym(name)
  eval(name, env=env)
}


get("y", myenv)
get2("y", myenv)
```


```{r}
myenv <- env(x=10,y=3)


assign2 <- function(name, value, env) {
  name <- sym(name)
  #eval(expr(name <- value))
  eval(expr(!!name <- !!value), envir = env)
}

assign2("z", 5, myenv)

myenv$z
```

#### 4 Modify source2() so it returns the result of every expression, not just the last one. Can you eliminate the for loop?

```{r}
writeLines("4+3
10*10
a <- 'apple'
toupper(a)",
           "sourceme20.R")

source2 <- function(path, env = caller_env()) {
  file <- paste(readLines(path, warn = FALSE), collapse = "\n")
  exprs <- parse_exprs(file)

  res <- NULL
  for (i in seq_along(exprs)) {
    res <- eval(exprs[[i]], env)
  }

  invisible(res)
}


source3 <- function(path, env = caller_env()) {
  file <- paste(readLines(path, warn = FALSE), collapse = "\n")
  exprs <- parse_exprs(file)

  res <- list()
  for (i in seq_along(exprs)) {
    res[[i]] <- eval(exprs[[i]], env)
  }

  invisible(res)
}

s1 <- source("sourceme20.R")
cat("S1\n")
s1

s2 <- source2("sourceme20.R")
cat("\n-------\nS2\n")
s2


s3 <- source3("sourceme20.R")
cat("\n-------\nS3\n")
s3
```

### 20.3 Quosures

Contain both an expression and an environment

Use `enquo` and `enquos` to capture user provided expressions

```{r}
foo <- function(x) enquo(x)
foo(a + b)

```

Or `quo` and `quos` but this isn't recommended

Or `new_quosure`

```{r}
new_quosure(expr(x + y), env(x = 1, y = 10))

```

Evaluate with `eval_tidy`

```{r}
q1 <- new_quosure(expr(x + y), env(x = 1, y = 10))
eval_tidy(q1)

```

### 20.3.6 Exercises

#### 1. Predict what each of the following quosures will return if evaluated.

```{r}
q1 <- new_quosure(expr(x), env(x = 1))
q1
eval_tidy(q1)
## 1

q2 <- new_quosure(expr(x + !!q1), env(x = 10))
q2
eval_tidy(q2)
## 11

q3 <- new_quosure(expr(x + !!q2), env(x = 100))
q3
eval_tidy(q3)
## 111
```


#### 2. Write an enenv() function that captures the environment associated with an argument. (Hint: this should only require two function calls.)

```{r}
enenv <- function(x) {
  get_env(enquo(x))
}

enenv(10)

Z <- 10

enenv(z)
```

## 20.4

data masks

a data fram argument provided to tidy_eval where variables are looked for.

If neded, can specify .data$x and .env$x to be explicit about where variables should be looked for

### Exercises

#### 1. Why did I use a for loop in transform2() instead of map()? Consider `transform2(df, x = x * 2, x = x * 2).`

Because map will create a list with 2 columns of the same name, and they will not be evaluated recursively (for map, both operations work on the original x, I think)

#### 2. Compare subset2 and 3

```{r}

subset2 <- function(data, rows) {
  rows <- enquo(rows)
  rows_val <- eval_tidy(rows, data)
  stopifnot(is.logical(rows_val))

  data[rows_val, , drop = FALSE]
}

subset3 <- function(data, rows) {
  rows <- enquo(rows)
  eval_tidy(expr(data[!!rows, , drop = FALSE]), data = data)
}

df <- data.frame(x = 1:3)
subset3(df, x == 1)
```

I would think that these would both work equally well.  subset2 is clearer coding and also will maybe give a better error message.  Also can get into trouble if the df has a column "data"

#### 3. The following function implements the basics of dplyr::arrange(). Annotate each line with a comment explaining what it does. Can you explain why !!.na.last is strictly correct, but omitting the !! is unlikely to cause problems?

```{r}
arrange2 <- function(.df, ..., .na.last = TRUE) {
  args <- enquos(...) ## get the ... expressions and quote them

  order_call <- expr(order(!!!args, na.last = !!.na.last)) ## create an expression that will call the function "order" with the arugments (which will be column names)

  ord <- eval_tidy(order_call, .df) # now run the order call, using the data frame as the environment
  stopifnot(length(ord) == nrow(.df)) # make sure nothing went wrong

  .df[ord, , drop = FALSE] # actually reorder the df and return it
}
```

!!.na.last will give the value of na.last as an argument in the expression, but without this it will be evaluated later and this should be ok becuase na.last will be in the calling environment.

## 20.5

#### 1. I’ve included an alternative implementation of threshold_var() below. What makes it different to the approach I used above? What makes it harder?

```{r}
threshold_var <- function(df, var, val) {
  var <- ensym(var)
  subset2(df, `$`(.data, !!var) >= !!val)
}
```


Here we are using the inline version of the `$` function, and that makes the code harder to read.

## 20.6

#### 1. Why does this function fail?

```{r, error=TRUE, eval=FALSE}
lm3a <- function(formula, data) {
  formula <- enexpr(formula)

  lm_call <- expr(lm(!!formula, data = data))
  eval(lm_call, caller_env())
}
lm3a(mpg ~ disp, mtcars)$call

```

It fails because `data` does not exist in `caller_env()`.  So we need to quote it and then unquote it in the call, as is done in the `lm3` given in the book

#### 2. When model building, typically the response and data are relatively constant while you rapidly experiment with different predictors. Write a small wrapper that allows you to reduce duplication in the code below.

```{r}
lm(mpg ~ disp, data = mtcars)
lm(mpg ~ I(1 / disp), data = mtcars)
lm(mpg ~ disp * cyl, data = mtcars)
```

What is the idea?  give the formulas in ... and loop through them?

A second approach would be to have one arugment for the response, and have the predictors in ...

Start with idea one.

```{r}
lms_1 <- function(..., data, env=caller_env()) {
  formulas <- enexprs(...)
  
  data <- enexpr(data)
  
  res <- list()
  
  for(i in seq_along(formulas)) {
    lm_call <- expr(lm(!!formulas[[i]], data=!!data))
    expr_print(lm_call)
    res[[i]] <- eval(lm_call, env=env)
    cat("----------------\n\n")
  }
  
  res
  
}

lms_1(mpg ~ disp, mpg ~ I(1/disp), mpg ~ disp * cyl, data=mtcars)
```

How about a list of predictors?

```{r}
lms_2 <- function(response, ..., data, env=caller_env()) {
  response <- enexpr(response)
  
  predictors <- enexprs(...)
  
  data <- enexpr(data)
  
  res <- list()
  
  for(i in seq_along(predictors)) {
    lm_call <- expr(lm(!!response ~ !!predictors[[i]], data=!!data))
    expr_print(lm_call)
    res[[i]] <- eval(lm_call, env=env)
    cat("----------------\n\n")
  }
  
  res
  
}

lms_2(mpg, disp, I(1/disp), disp * cyl, data=mtcars)
```

#### 3. Another way to write resample_lm() would be to include the resample expression (data[sample(nrow(data), replace = TRUE), , drop = FALSE]) in the data argument. Implement that approach. What are the advantages? What are the disadvantages?

advantage: user control.  disadvantage: user has to rewrite it if there df is not called "data".  Uglier.  lm output is uglier.

```{r}
resample_lm3 <- function(formula, data=data[sample(nrow(data), replace = TRUE), , drop = FALSE], env = caller_env()) {
  formula <- enexpr(formula)
  data = enexpr(data)
  
  lm_call <- expr(lm(!!formula, data = !!data))
  expr_print(lm_call)
  eval(lm_call, env)
}

df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))

resample_lm3(y ~ x, data=df[sample(nrow(df), replace = TRUE), , drop = FALSE])
```