-
Notifications
You must be signed in to change notification settings - Fork 0
/
Chapters_9_10.Rmd
125 lines (82 loc) · 2.58 KB
/
Chapters_9_10.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
title: "Chapters 9 and 10"
author: "Laura"
date: "11/19/2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
library(tidyverse); library(skimr); library(nycflights13); library(GGally); library(ggstance); library(lvplot); library(hexbin); library(modelr)
```
## Notes for Chapter 9: Intro to Wrangling
## Notes for Chapter 10: Tibbles
### 10.2 Creating tibbles
```{r ch1021}
as_tibble(iris)
tibble(
x = 1:5,
y = 1,
z = x ^ 2 + y
)
tb <- tibble(
`:)` = "smile",
` ` = "space",
`2000` = "number"
)
tb
tribble(
~x, ~y, ~z,
#--|--|----
# the previous line is there to mark where the header of the tibble is
"a", 2, 3.6,
"b", 1, 8.5
)
```
### 10.3 Tibbles vs. data.frame
```{r ch1031}
tibble(
a = lubridate::now() + runif(1e3) * 86400,
b = lubridate::today() + runif(1e3) * 30,
c = 1:1e3,
d = runif(1e3),
e = sample(letters, 1e3, replace = TRUE)
)
```
#### 10.3.2 Subsetting
So far all the tools you’ve learned have worked with complete data frames. If you want to pull out a single variable, you need some new tools, `$` and `[[`. `[[` can extract by name or position; `$` only extracts by name but is a little less typing.
To use these in a pipe, you’ll need to use the special placeholder `.`:
```{r ch10321}
df <- tibble(
x = runif(5),
y = rnorm(5)
)
df %>% .$x # equivalent to df$x in the non-pipe world
df %>% .[["x"]] # equivalent to df[["x"]] in the non-pipe world
```
## 10.5 Exercises
* How can you tell if an object is a tibble? (Hint: try printing `mtcars`, which is a regular data frame).
* Compare and contrast the following operations on a data.frame and equivalent tibble. What is different? Why might the default data frame behaviours cause you frustration?
```{r ex1051}
df <- data.frame(abc = 1, xyz = "a")
df$x
df[, "xyz"]
df[, c("abc", "xyz")]
```
* If you have the name of a variable stored in an object, e.g. var <- "mpg", how can you extract the reference variable from a tibble?
* Practice referring to non-syntactic names in the following data frame by:
1. Extracting the variable called 1.
2. Plotting a scatterplot of 1 vs 2.
3. Creating a new column called 3 which is 2 divided by 1.
4. Renaming the columns to one, two and three.
```{r ex1052}
annoying <- tibble(
`1` = 1:10,
`2` = `1` * 2 + rnorm(length(`1`))
)
```
* What does tibble::enframe() do? When might you use it?
* What option controls how many additional column names are printed at the footer of a tibble?
## Misc
Ways to find help I am discovering:
* `vignette("<packagename>")`
* `package?<packagename>`