-
Notifications
You must be signed in to change notification settings - Fork 8
/
07-intro-to-functions.Rmd
408 lines (294 loc) · 9.13 KB
/
07-intro-to-functions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
---
title: "Basics of functions"
subtitle: "Stat 133"
author: "Gaston Sanchez"
output: github_document
fontsize: 11pt
urlcolor: blue
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
```
> ### Learning Objectives
>
> - Define a function that takes arguments
> - Return a value from a function
> - Test a function
> - Set default values for function arguments
------
## Anatomy of a function
To define a new function in R you use the function `function()`.
You need to specify a name for the function, and then assign `function()`
to the chosen name. You also need to define optional arguments (i.e. inputs).
And of course, you must write the code (i.e. the body) so the function does
something when you use it:
```{r}
# anatomy of a function
some_name <- function(arguments) {
# body of the function
}
```
- Generally, you give a name to a function.
- A function takes one or more inputs (or none), known as _arguments_.
- The expressions forming the operations comprise the __body__ of the function.
- Usually, you wrap the body of the functions with curly braces.
- A function returns a single value.
A less abstract function could have the following structure:
```r
function_name <- function(arg1, arg2, etc)
{
expression_1
expression_2
...
expression_n
}
```
### Example 1: From Fahrenheit to Celsius
Let's consider a typical programming example that involves converting
fahrenheit degrees into celsius degrees. The conversion formula is
$(F - 32) \times 5/9 = C$. Here's some R code to convert 100 fahrenheit
degrees into Celsius degrees:
```{r}
# fahrenheit degrees
far_deg <- 100
# convert to celsius
(far_deg - 32) * (5/9)
```
What if you want to convert 90 fahrenheit degrees in Celsius degrees?
One option would be to rewrite the previous lines as:
```{r}
# fahrenheit degrees
far_deg <- 90
# convert to celsius
(far_deg - 32) * (5/9)
```
However, retyping many lines of code can be very boring, tedious, and
inefficient. To make your code reusable in a more efficient manner, you will
have to write functions.
#### Writing a simple function
So, how do you create a function? The first step is to write code and make
sure that it works. In this case we already have the code that converts a
number in Fahrenheit units into Celsius.
The next step is to __encapsulate__ the code in the form of a function. You
have to choose a name, some argument(s), and determine the output. Here's one
example with a function `fahrenheit_to_celsius()`
```{r}
fahrenheit_to_celsius <- function(x) {
y <- (x - 32) * (5/9)
return(y)
}
fahrenheit_to_celsius(100)
```
If you want to get the conversion of 90 fahrenheit degrees, you just simply
execute it again by changing its argument:
```{r}
fahrenheit_to_celsius(90)
```
And because we are using arithmetic operators (i.e. multiplication, subtraction,
division), the function is also vectorized:
```{r}
fahrenheit_to_celsius(c(90, 100, 110))
```
Sometimes it is recommended to add a default value to one (or more) of the
arguments. In this case, we can give a default value of `x = 1`. When the
user executes the function without any input, `fahrenheit_to_celsius` returns
the value of 1 fahrenheit degree to Celsius degrees:
```{r}
fahrenheit_to_celsius <- function(x = 1) {
(x - 32) * (5/9)
}
# default execution
fahrenheit_to_celsius()
```
-----
## Another example
Let's considet another toy example with a function that squares its argument:
```{r}
square <- function(x) {
x * x
}
```
- the function name is `"square"`
- it has one argument: `x`
- the function body consists of one simple expression
- it returns the value `x * x`
`square()` works like any other function in R:
```{r}
square(10)
```
In this case, `square()` is also vectorized:
```{r}
square(1:5)
```
Why is `square()` vectorized?
Once defined, functions can be used in other function definitions:
```{r}
sum_of_squares <- function(x) {
sum(square(x))
}
sum_of_squares(1:5)
```
### Simple Expressions
Functions with a body consisting of a __simple expression__ can be written with
no braces (in one single line!):
```{r}
square <- function(x) x * x
square(10)
```
However, as a general coding rule, you should get into the habit of writing functions using braces.
### Nested Functions
We can also define a function inside another function:
```{r}
getmax <- function(a) {
# nested function
maxpos <- function(u) which.max(u)
# output
list(position = maxpos(a),
value = max(a))
}
getmax(c(2, -4, 6, 10, pi))
```
## Naming Functions
There are different ways to name functions. The following list provides some
examples with different naming styles:
- `squareroot()`
- `SquareRoot()`
- `squareRoot()`
- `square.root()`
- `square_root()`
I personally use the _underscore_ style. But you may find other programmers
employing a different naming format. We strongly suggest using a consistent
naming style. Many programming teams define their own style guides. If you
are new to programming, it usually takes time to develop a consistent style.
However, the sooner you have a defined personal style, the better.
It is also important that you know which names are invalid in R:
- `5quareroot()`: cannot begin with a number
- `_square()`: cannot begin with an underscore
- `square-root()`: cannot use hyphenated names
In addition, avoid using an already existing name, e.g. `sqrt()`.
Sometimes you will find functions with names starting with a dot: `.hidden()`;
this type of functions are hidden functions, meaning that the function won't
be visible by default in the list of objects in your working environment.
```{r}
ls()
visible <- function(x) {
x * 2
}
.hidden <- function(y) {
y * 2
}
ls()
```
## Function Output
The value of a function can be established in two ways:
- As the last evaluated simple expression (in the body of the function)
- An explicitly __returned__ value via `return()`
Here's a basic example of a function in which the output is the last evaluated
expression:
```{r}
add <- function(x, y) {
x + y
}
add(2, 3)
```
Here's another version of `add()` in which the output is the last evaluated
expression:
```{r}
add <- function(x, y) {
z <- x + y
z
}
add(2, 3)
```
Be careful with the form in which the last expression is evaluated:
```{r}
add <- function(x, y) {
z <- x + y
}
add(2, 3)
```
In this case, it looks like `add()` does not work. If you run the previous
code, nothing appears in the console. Can you guess why? To help you answer
this question, assign the invocation to an object and then print the object:
```r
why <- add(2, 3)
why
```
`add()` does work. The issue has to do with the form of the last expression.
Nothing gets displayed in the console because the last statement `z <- x + y`
is an assignment (that does not print anything).
### The `return()` command
More often than not, the `return()` command is included to explicitly indicate
the output of a function:
```{r}
add <- function(x, y) {
z <- x + y
return(z)
}
add(2, 3)
```
I've seen that many users with previous programming experience in other languages
prefer to use `return()`. The main reason is that most programming languages
tend to use some sort of _return_ statement to indicate the output of a function.
So, following good language-agnostic coding practices, we also recommend that
you use the function `return()`. In this way, any reader can quickly scan the
body of your functions and visually locate the places in which a _return_
statement is being made.
-----
### Variance Function Example
The sample variance is given by the following formula:
$$
var(x) = \frac{1}{n-1} \sum_{i = 1}^{n} (x_i - \bar{x})^2
$$
![sample variance](07-images/sample-variance.png)
Let's create a `variance()` function that computes the sample variance.
The first step should always be writing the code that will become the body of
the function:
```{r}
# start simple
x <- 1:10
# get working code
sum((x - mean(x)) ^ 2) / (length(x) - 1)
# test it: compare it to var()
var(1:10)
```
One you know your code works, then you can encapsulate with `function()`:
```{r}
# encapsulate your code
variance <- function(x) {
sum((x - mean(x)) ^ 2) / (length(x) - 1)
}
# check that it works
variance(x)
```
Before doing any further changes to `variance()`, you should test it with
a handful of other (possibly extreme) cases:
```{r}
# consider less simple cases
variance(runif(10))
# what about atypical cases?
variance(rep(0, 10))
# what if there are missing values?
variance(c(1:9, NA))
```
You can then start gradually adapting your function to make it more robust,
more flexible, more user friendly, etc. For instance, `variance()` returns
`NA` when the provided vector contains missing values. But you can include
an argument that removes any missing values. Many functions in R have this
feature, like `sum()`, `mean()`, `median()`. They all use the so-called
`na.rm` argument to specify if missing values should be removed before any
computation is done:
```{r}
# adapt it gradually
variance <- function(x, na.rm = FALSE) {
if (na.rm) {
# removing missing values
x <- x[!is.na(x)]
}
# compute sample variance
sum((x - mean(x)) ^ 2) / (length(x) - 1)
}
# check that it works
variance(c(1:9, NA), na.rm = TRUE)
```