This repository has been archived by the owner on Sep 18, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 370
/
Copy pathblock009_dplyr-intro.html
460 lines (415 loc) · 23.4 KB
/
block009_dplyr-intro.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />
<title>Introduction to dplyr</title>
<script src="libs/jquery-1.11.3/jquery.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="libs/bootstrap-3.3.5/css/bootstrap.min.css" rel="stylesheet" />
<script src="libs/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/respond.min.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-68219208-1', 'auto');
ga('send', 'pageview');
</script>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet"
href="libs/highlight/default.css"
type="text/css" />
<script src="libs/highlight/highlight.js"></script>
<style type="text/css">
pre:not([class]) {
background-color: white;
}
</style>
<script type="text/javascript">
if (window.hljs && document.readyState && document.readyState === "complete") {
window.setTimeout(function() {
hljs.initHighlighting();
}, 0);
}
</script>
<style type="text/css">
h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
.table th:not([align]) {
text-align: left;
}
</style>
<link rel="stylesheet" href="libs/local/main.css" type="text/css" />
<link rel="stylesheet" href="libs/local/nav.css" type="text/css" />
<link rel="stylesheet" href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" type="text/css" />
</head>
<body>
<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
code {
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
height: auto;
}
.tabbed-pane {
padding-top: 12px;
}
button.code-folding-btn:focus {
outline: none;
}
</style>
<div class="container-fluid main-container">
<!-- tabsets -->
<script src="libs/navigation-1.1/tabsets.js"></script>
<script>
$(document).ready(function () {
window.buildTabsets("TOC");
});
</script>
<!-- code folding -->
<header>
<div class="nav">
<a class="nav-logo" href="index.html">
<img src="static/img/stat545-logo-s.png" width="70px" height="70px"/>
</a>
<ul>
<li class="home"><a href="index.html">Home</a></li>
<li class="faq"><a href="faq.html">FAQ</a></li>
<li class="syllabus"><a href="syllabus.html">Syllabus</a></li>
<li class="topics"><a href="topics.html">Topics</a></li>
<li class="people"><a href="people.html">People</a></li>
</ul>
</div>
</header>
<div class="fluid-row" id="header">
<h1 class="title toc-ignore">Introduction to dplyr</h1>
</div>
<div id="TOC">
<ul>
<li><a href="#intro">Intro</a><ul>
<li><a href="#load-dplyr-and-gapminder">Load <code>dplyr</code> and <code>gapminder</code></a></li>
<li><a href="#say-hello-to-the-gapminder-tibble">Say hello to the Gapminder tibble</a></li>
</ul></li>
<li><a href="#think-before-you-create-excerpts-of-your-data">Think before you create excerpts of your data …</a></li>
<li><a href="#use-filter-to-subset-data-row-wise.">Use <code>filter()</code> to subset data row-wise.</a></li>
<li><a href="#meet-the-new-pipe-operator">Meet the new pipe operator</a></li>
<li><a href="#use-select-to-subset-the-data-on-variables-or-columns.">Use <code>select()</code> to subset the data on variables or columns.</a></li>
<li><a href="#revel-in-the-convenience">Revel in the convenience</a></li>
<li><a href="#pure-predictable-pipeable">Pure, predictable, pipeable</a></li>
<li><a href="#resources">Resources</a></li>
</ul>
</div>
<div id="intro" class="section level3">
<h3>Intro</h3>
<p><code>dplyr</code> is a package for data manipulation, developed by Hadley Wickham and Romain Francois. It is built to be fast, highly expressive, and open-minded about how your data is stored. It is installed as part of the the <a href="https://github.com/hadley/tidyverse"><code>tidyverse</code></a> meta-package and, as a core package, it is among those loaded via <code>library(tidyverse)</code>.</p>
<p><code>dplyr</code>’s roots are in an earlier package called <a href="http://plyr.had.co.nz"><code>plyr</code></a>, which implements the <a href="https://www.jstatsoft.org/article/view/v040i01">“split-apply-combine” strategy for data analysis</a> (PDF). Where <code>plyr</code> covers a diverse set of inputs and outputs (e.g., arrays, data frames, lists), <code>dplyr</code> has a laser-like focus on data frames or, in the <code>tidyverse</code>, “tibbles”. <code>dplyr</code> is a package-level treament of the <code>ddply()</code> function from <code>plyr</code>, because “data frame in, data frame out” proved to be so incredibly important.</p>
<p>Have no idea what I’m talking about? Not sure if you care? If you use these base R functions: <code>subset()</code>, <code>apply()</code>, <code>[sl]apply()</code>, <code>tapply()</code>, <code>aggregate()</code>, <code>split()</code>, <code>do.call()</code>, <code>with()</code>, <code>within()</code>, then you should keep reading. Also, if you use <code>for()</code> loops alot, you might enjoy learning other ways to iterate over rows or groups of rows or variables in a data frame.</p>
<div id="load-dplyr-and-gapminder" class="section level4">
<h4>Load <code>dplyr</code> and <code>gapminder</code></h4>
<p>I choose to load the <code>tidyverse</code>, which will load <code>dplyr</code>, among other packages we use incidentally below. Also load <code>gapminder</code>.</p>
<pre class="r"><code>library(gapminder)
library(tidyverse)</code></pre>
<pre><code>## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr</code></pre>
<pre><code>## Conflicts with tidy packages ----------------------------------------------</code></pre>
<pre><code>## filter(): dplyr, stats
## lag(): dplyr, stats</code></pre>
</div>
<div id="say-hello-to-the-gapminder-tibble" class="section level4">
<h4>Say hello to the Gapminder tibble</h4>
<p>The <code>gapminder</code> data frame is a special kind of data frame: a tibble.</p>
<pre class="r"><code>gapminder</code></pre>
<pre><code>## # A tibble: 1,704 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134
## 7 Afghanistan Asia 1982 39.854 12881816 978.0114
## 8 Afghanistan Asia 1987 40.822 13867957 852.3959
## 9 Afghanistan Asia 1992 41.674 16317921 649.3414
## 10 Afghanistan Asia 1997 41.763 22227415 635.3414
## # ... with 1,694 more rows</code></pre>
<p>It’s tibble-ness is why we get nice compact printing. For a reminder of the problems with base data frame printing, go type <code>iris</code> in the R Console or, better yet, print a data frame to screen that has lots of columns.</p>
<p>Note how gapminder’s <code>class()</code> includes <code>tbl_df</code>; the “tibble” terminology is a nod to this.</p>
<pre class="r"><code>class(gapminder)</code></pre>
<pre><code>## [1] "tbl_df" "tbl" "data.frame"</code></pre>
<p>There will be some functions, like <code>print()</code>, that know about tibbles and do something special. There will others that do not, like <code>summary()</code>. In which case the regular data frame treatment will happen, because every tibble is also a regular data frame.</p>
<p>To turn any data frame into a tibble use <code>as_tibble()</code>:</p>
<pre class="r"><code>as_tibble(iris)</code></pre>
<pre><code>## # A tibble: 150 × 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fctr>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # ... with 140 more rows</code></pre>
</div>
</div>
<div id="think-before-you-create-excerpts-of-your-data" class="section level3">
<h3>Think before you create excerpts of your data …</h3>
<p>If you feel the urge to store a little snippet of your data:</p>
<pre class="r"><code>(canada <- gapminder[241:252, ])</code></pre>
<pre><code>## # A tibble: 12 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Canada Americas 1952 68.750 14785584 11367.16
## 2 Canada Americas 1957 69.960 17010154 12489.95
## 3 Canada Americas 1962 71.300 18985849 13462.49
## 4 Canada Americas 1967 72.130 20819767 16076.59
## 5 Canada Americas 1972 72.880 22284500 18970.57
## 6 Canada Americas 1977 74.210 23796400 22090.88
## 7 Canada Americas 1982 75.760 25201900 22898.79
## 8 Canada Americas 1987 76.860 26549700 26626.52
## 9 Canada Americas 1992 77.950 28523502 26342.88
## 10 Canada Americas 1997 78.610 30305843 28954.93
## 11 Canada Americas 2002 79.770 31902268 33328.97
## 12 Canada Americas 2007 80.653 33390141 36319.24</code></pre>
<p>Stop and ask yourself …</p>
<blockquote>
<p>Do I want to create mini datasets for each level of some factor (or unique combination of several factors) … in order to compute or graph something?</p>
</blockquote>
<p>If YES, <strong>use proper data aggregation techniques</strong> or facetting in <code>ggplot2</code> – <strong>don’t subset the data</strong>. Or, more realistic, only subset the data as a temporary measure while you develop your elegant code for computing on or visualizing these data subsets.</p>
<p>If NO, then maybe you really do need to store a copy of a subset of the data. But seriously consider whether you can achieve your goals by simply using the <code>subset =</code> argument of, e.g., the <code>lm()</code> function, to limit computation to your excerpt of choice. Lots of functions offer a <code>subset =</code> argument!</p>
<p>Copies and excerpts of your data clutter your workspace, invite mistakes, and sow general confusion. Avoid whenever possible.</p>
<p>Reality can also lie somewhere in between. You will find the workflows presented below can help you accomplish your goals with minimal creation of temporary, intermediate objects.</p>
</div>
<div id="use-filter-to-subset-data-row-wise." class="section level3">
<h3>Use <code>filter()</code> to subset data row-wise.</h3>
<p><code>filter()</code> takes logical expressions and returns the rows for which all are <code>TRUE</code>.</p>
<pre class="r"><code>filter(gapminder, lifeExp < 29)</code></pre>
<pre><code>## # A tibble: 2 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Rwanda Africa 1992 23.599 7290203 737.0686</code></pre>
<pre class="r"><code>filter(gapminder, country == "Rwanda", year > 1979)</code></pre>
<pre><code>## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Rwanda Africa 1982 46.218 5507565 881.5706
## 2 Rwanda Africa 1987 44.020 6349365 847.9912
## 3 Rwanda Africa 1992 23.599 7290203 737.0686
## 4 Rwanda Africa 1997 36.087 7212583 589.9445
## 5 Rwanda Africa 2002 43.413 7852401 785.6538
## 6 Rwanda Africa 2007 46.242 8860588 863.0885</code></pre>
<pre class="r"><code>filter(gapminder, country %in% c("Rwanda", "Afghanistan"))</code></pre>
<pre><code>## # A tibble: 24 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134
## 7 Afghanistan Asia 1982 39.854 12881816 978.0114
## 8 Afghanistan Asia 1987 40.822 13867957 852.3959
## 9 Afghanistan Asia 1992 41.674 16317921 649.3414
## 10 Afghanistan Asia 1997 41.763 22227415 635.3414
## # ... with 14 more rows</code></pre>
<p>Compare with some base R code to accomplish the same things</p>
<pre class="r"><code>gapminder[gapminder$lifeExp < 29, ] ## repeat `gapminder`, [i, j] indexing is distracting
subset(gapminder, country == "Rwanda") ## almost same as filter; quite nice actually</code></pre>
<p>Under no circumstances should you subset your data the way I did at first:</p>
<pre class="r"><code>excerpt <- gapminder[241:252, ]</code></pre>
<p>Why is this a terrible idea?</p>
<ul>
<li>It is not self-documenting. What is so special about rows 241 through 252?</li>
<li>It is fragile. This line of code will produce different results if someone changes the row order of <code>gapminder</code>, e.g. sorts the data earlier in the script.</li>
</ul>
<pre class="r"><code>filter(gapminder, country == "Canada")</code></pre>
<p>This call explains itself and is fairly robust.</p>
</div>
<div id="meet-the-new-pipe-operator" class="section level3">
<h3>Meet the new pipe operator</h3>
<p>Before we go any further, we should exploit the new pipe operator that the tidyverse imports from the <a href="https://github.com/smbache/magrittr"><code>magrittr</code></a> package by Stefan Bache. This is going to change your data analytical life. You no longer need to enact multi-operation commands by nesting them inside each other, like so many <a href="http://blogue.us/wp-content/uploads/2009/07/Unknown-21.jpeg">Russian nesting dolls</a>. This new syntax leads to code that is much easier to write and to read.</p>
<p>Here’s what it looks like: <code>%>%</code>. The RStudio keyboard shortcut: Ctrl + Shift + M (Windows), Cmd + Shift + M (Mac).</p>
<p>Let’s demo then I’ll explain:</p>
<pre class="r"><code>gapminder %>% head()</code></pre>
<pre><code>## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134</code></pre>
<p>This is equivalent to <code>head(gapminder)</code>. The pipe operator takes the thing on the left-hand-side and <strong>pipes</strong> it into the function call on the right-hand-side – literally, drops it in as the first argument.</p>
<p>Never fear, you can still specify other arguments to this function! To see the first 3 rows of Gapminder, we could say <code>head(gapminder, 3)</code> or this:</p>
<pre class="r"><code>gapminder %>% head(3)</code></pre>
<pre><code>## # A tibble: 3 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007</code></pre>
<p>I’ve advised you to think “gets” whenever you see the assignment operator, <code><-</code>. Similary, you should think “then” whenever you see the pipe operator, <code>%>%</code>.</p>
<p>You are probably not impressed yet, but the magic will soon happen.</p>
</div>
<div id="use-select-to-subset-the-data-on-variables-or-columns." class="section level3">
<h3>Use <code>select()</code> to subset the data on variables or columns.</h3>
<p>Back to <code>dplyr</code> …</p>
<p>Use <code>select()</code> to subset the data on variables or columns. Here’s a conventional call:</p>
<pre class="r"><code>select(gapminder, year, lifeExp)</code></pre>
<pre><code>## # A tibble: 1,704 × 2
## year lifeExp
## <int> <dbl>
## 1 1952 28.801
## 2 1957 30.332
## 3 1962 31.997
## 4 1967 34.020
## 5 1972 36.088
## 6 1977 38.438
## 7 1982 39.854
## 8 1987 40.822
## 9 1992 41.674
## 10 1997 41.763
## # ... with 1,694 more rows</code></pre>
<p>And here’s the same operation, but written with the pipe operator and piped through <code>head()</code>:</p>
<pre class="r"><code>gapminder %>%
select(year, lifeExp) %>%
head(4)</code></pre>
<pre><code>## # A tibble: 4 × 2
## year lifeExp
## <int> <dbl>
## 1 1952 28.801
## 2 1957 30.332
## 3 1962 31.997
## 4 1967 34.020</code></pre>
<p>Think: “Take <code>gapminder</code>, then select the variables year and lifeExp, then show the first 4 rows.”</p>
</div>
<div id="revel-in-the-convenience" class="section level3">
<h3>Revel in the convenience</h3>
<p>Here’s the data for Cambodia, but only certain variables:</p>
<pre class="r"><code>gapminder %>%
filter(country == "Cambodia") %>%
select(year, lifeExp)</code></pre>
<pre><code>## # A tibble: 12 × 2
## year lifeExp
## <int> <dbl>
## 1 1952 39.417
## 2 1957 41.366
## 3 1962 43.415
## 4 1967 45.415
## 5 1972 40.317
## 6 1977 31.220
## 7 1982 50.957
## 8 1987 53.914
## 9 1992 55.803
## 10 1997 56.534
## 11 2002 56.752
## 12 2007 59.723</code></pre>
<p>and what a typical base R call would look like:</p>
<pre class="r"><code>gapminder[gapminder$country == "Cambodia", c("year", "lifeExp")]</code></pre>
<pre><code>## # A tibble: 12 × 2
## year lifeExp
## <int> <dbl>
## 1 1952 39.417
## 2 1957 41.366
## 3 1962 43.415
## 4 1967 45.415
## 5 1972 40.317
## 6 1977 31.220
## 7 1982 50.957
## 8 1987 53.914
## 9 1992 55.803
## 10 1997 56.534
## 11 2002 56.752
## 12 2007 59.723</code></pre>
</div>
<div id="pure-predictable-pipeable" class="section level3">
<h3>Pure, predictable, pipeable</h3>
<p>We’ve barely scratched the surface of <code>dplyr</code> but I want to point out key principles you may start to appreciate. If you’re new to R or “programming with data”, feel free skip this section and <a href="block010_dplyr-end-single-table.html">move on</a>.</p>
<p><code>dplyr</code>’s verbs, such as <code>filter()</code> and <code>select()</code>, are what’s called <a href="http://en.wikipedia.org/wiki/Pure_function">pure functions</a>. To quote from Wickham’s <a href="http://adv-r.had.co.nz/Functions.html">Advanced R Programming book</a>:</p>
<blockquote>
<p>The functions that are the easiest to understand and reason about are pure functions: functions that always map the same input to the same output and have no other impact on the workspace. In other words, pure functions have no side effects: they don’t affect the state of the world in any way apart from the value they return.</p>
</blockquote>
<p>In fact, these verbs are a special case of pure functions: they take the same flavor of object as input and output. Namely, a data frame or one of the other data receptacles <code>dplyr</code> supports.</p>
<p>And finally, the data is <strong>always</strong> the very first argument of the verb functions.</p>
<p>This set of deliberate design choices, together with the new pipe operator, produces a highly effective, low friction <a href="http://adv-r.had.co.nz/dsl.html">domain-specific language</a> for data analysis.</p>
<p>Go to the next block, <a href="block010_dplyr-end-single-table.html"><code>dplyr</code> functions for a single dataset</a>, for more <code>dplyr</code>!</p>
</div>
<div id="resources" class="section level3">
<h3>Resources</h3>
<p><code>dplyr</code> official stuff</p>
<ul>
<li>package home <a href="http://cran.r-project.org/web/packages/dplyr/index.html">on CRAN</a>
<ul>
<li>note there are several vignettes, with the <a href="http://cran.r-project.org/web/packages/dplyr/vignettes/introduction.html">introduction</a> being the most relevant right now</li>
<li>the <a href="http://cran.rstudio.com/web/packages/dplyr/vignettes/window-functions.html">one on window functions</a> will also be interesting to you now</li>
</ul></li>
<li>development home <a href="https://github.com/hadley/dplyr">on GitHub</a></li>
<li><a href="https://www.dropbox.com/sh/i8qnluwmuieicxc/AAAgt9tIKoIm7WZKIyK25lh6a">tutorial HW delivered</a> (note this links to a DropBox folder) at useR! 2014 conference</li>
</ul>
<p><a href="https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf">RStudio Data Wrangling cheatsheet</a>, covering <code>dplyr</code> and <code>tidyr</code>. Remember you can get to these via <em>Help > Cheatsheets.</em></p>
<p><a href="https://github.com/tjmahr/MadR_Pipelines">Excellent slides</a> on pipelines and <code>dplyr</code> by TJ Mahr, talk given to the Madison R Users Group.</p>
<p>Blog post <a href="http://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/">Hands-on dplyr tutorial for faster data manipulation in R</a> by Data School, that includes a link to an R Markdown document and links to videos</p>
<p><a href="bit001_dplyr-cheatsheet.html">Cheatsheet</a> I made for <code>dplyr</code> join functions (not relevant yet but soon)</p>
</div>
<div class="footer">
This work is licensed under the <a href="http://creativecommons.org/licenses/by-nc/3.0/">CC BY-NC 3.0 Creative Commons License</a>.
</div>
</div>
<script>
// add bootstrap table styles to pandoc tables
$(document).ready(function () {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
});
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>