-
Notifications
You must be signed in to change notification settings - Fork 72
/
cm006.Rmd
189 lines (112 loc) · 5 KB
/
cm006.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
# Intro to data wrangling, Part I
__Worksheet__: You can find a worksheet template for today [here](https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/tutorials/cm006-exercise.Rmd).
## Today's Lessons
Today we'll introduce the [`dplyr`](https://dplyr.tidyverse.org/) package. Specifically, we'll look at these three lessons:
- Intro to `dplyr` syntax
- The `dplyr` advantage
- Relational/comparison and logical operators in R
## Resources
All three of today's lessons are closely aligned to the [stat545: dplyr-intro](http://stat545.com/block009_dplyr-intro.html).
More detail can be found in the [r4ds: transform](http://r4ds.had.co.nz/transform.html) chapter, up until and including the `select()` section. Section 5.2 also elaborates on relational/comparison and logical operators in R
Here are some supplementary resources:
- A similar resource to the r4ds one above is the [intro to dplyr vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html), up until and including the `select()` section.
- Want to read more about piping? See [r4ds: pipes](http://r4ds.had.co.nz/pipes.html).
## Participation
To get participation points for today, we'll be filling out the [cm006-exercise.Rmd](https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/tutorials/cm006-exercise.Rmd) file, and adding it to your participation repo.
## Intro to `dplyr` syntax
### Learning Objectives
Here are the concepts we'll be exploring in this lesson:
- tidyverse
- `dplyr` functions:
- select
- arrange
- piping
By the end of this lesson, students are expected to be able to:
- subset and rearrange data with `dplyr`
- use piping (`%>%`) when implementing function chains
### Preamble
Let's talk about:
- The history of `dplyr`: `plyr`
- tibbles are a special type of data frame
- the [tidyverse](https://www.tidyverse.org/)
### Demonstration
Let's get started with the exercise:
1. Open RStudio, and download the `tidyverse` meta-package by executing `install.packages("tidyverse")` into the R console.
2. _Optional_: open the `STAT545_participation` RStudio project in RStudio.
3. With RStudio, open the `cm006-exercise.Rmd` file you downloaded and committed earlier.
4. Follow the instructions in the `.Rmd` file until the *resume lecture* section.
## Small break
Here are some things you might choose to do on this break:
- Talk with a TA, Vincenzo, or your neighbour(s) about the content so far.
- Attempt the bonus exercises on the `cm006-exercise.Rmd` file.
- Work on an assignment.
## The `dplyr` advantage
### Learning Objectives
By the end of this lesson, students are expected to be able to:
- Have a sense of why `dplyr` is advantageous compared to the "base R" way with respect to good coding practice.
Why?
- Having this in the back of your mind will help you identify qualities of and produce a readable analysis.
### Compare base R to `dplyr`
__Self-documenting code__.
This is where the tidyverse shines.
Example of `dplyr` vs base R:
```
gapminder %>%
filter(country == "Cambodia") %>%
select(year, lifeExp)
```
vs.
```
gapminder[gapminder$country == "Cambodia", c("year", "lifeExp")]
```
__No need to take excerpts__.
Wrangle with `dplyr` first, then pipe into a plot/analysis.
OR, use the `subset` argument that's often offered by R functions like `lm()`.
Especially don't use magic numbers to subset!
Note that you need to use the assignment operator to store changes!
## Relational/Comparison and Logical Operators in R
### Learning Objectives
Here are the concepts we'll be exploring in this lesson:
- Relational/Comparison operators
- Logical operators
- `dplyr` functions:
- filter
- mutate
By the end of this lesson, students are expected to be able to:
- Predict the output of R code containing the above operators.
- Explain the difference between `&`/`&&` and `|`/`||`, and name a situation where one should be used over the other.
- Subsetting and transforming data using filter and mutate
### R Operators
**Arithmetic** operators allow us to carry out mathematical operations:
| Operator | Description |
|------|:---------|
| + | Add |
| - | Subtract |
| * | Multiply |
| / | Divide |
| ^ | Exponent |
| %% | Modulus (remainder from division) |
**Relational** operators allow us to compare values:
| Operator | Description |
|------|:---------|
| < | Less than |
| > | Greater than |
| <= | Less than or equal to |
| >= | Greater than or equal to |
| == | Equal to |
| != | Not equal to |
* Arithmetic and relational operators work on vectors.
**Logical** operators allow us to carry out boolean operations:
| Operator | Description |
|---|:---|
| ! | Not |
| \| | Or (element_wise) |
| & | And (element-wise) |
| \|\| | Or |
| && | And |
* The difference between `|` and `||` is that `||` evaluates only the first element of the two vectors, whereas `|` evaluates element-wise.
### Demonstration
Continue along with the `cm006-exercise.Rmd` file.
## If there's time remaining
1. Let's do the bonus exercises together, in the `cm006-exercise.Rmd` file.
2. Another "break"