forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcode.Rmd
577 lines (439 loc) · 32.1 KB
/
code.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
# R code {#sec-r}
```{r, echo = FALSE}
source("common.R")
```
The first principle of making a package is that all R code goes in the `R/` directory.
In this chapter, you'll learn about organising your functions into files, maintaining a consistent style, and recognizing the stricter requirements for functions in a package (versus in a script).
We'll also remind you of the fundamental workflows for test-driving and formally checking an in-development package: `load_all()`, `test()`, and `check()`.
## Organise functions into files {#sec-code-organising}
The only hard rule is that your package must store its function definitions in R scripts, i.e. files with extension `.R`, that live in the `R/` directory[^code-1].
However, a few more conventions can make the source code of your package easier to navigate and relieve you of re-answering "How should I name this?" each time you create a new file.
The Tidyverse Style Guide offers some [general advice about file names](https://style.tidyverse.org/files.html) and also [advice that specifically applies to files in a package](https://style.tidyverse.org/package-files.html).
We expand on this here.
[^code-1]: Unfortunately you can't use subdirectories inside `R/`.
The next best thing is to use a common prefix, e.g., `abc-*.R`, to signal that a group of files are related.
The file name should be meaningful and convey which functions are defined within.
While you're free to arrange functions into files as you wish, the two extremes are bad: don't put all functions into one file and don't put each function into its own separate file.
This advice should inform your general policy, but there are exceptions to every rule.
If a specific function is very large or has lots of documentation, it can make sense to give it its own file, named after the function.
More often, a single `.R` file will contain multiple function definitions: such as a main function and its supporting helpers, a family of related functions, or some combination of the two.
@tbl-putting-functions-in-files presents some examples from the actual source of the [tidyr package](http://tidyr.tidyverse.org/) at version 1.1.2.
There are some departures from the hard-and-fast rules given above, which illustrates that there's a lot of room for judgment here.
```{r include = FALSE}
library(tidyverse)
tidyr_separate <- tibble(
`Organising principle` = "Main function plus helpers",
`Source file` = "[tidyr/R/separate.R](https://github.com/tidyverse/tidyr/blob/v1.1.2/R/separate.R)",
Comments = "Defines the user-facing `separate()` (an S3 generic), a `data.frame` method, and private helpers"
)
tidyr_rectangle <- tibble(
`Organising principle` = "Family of functions",
`Source file` = "[tidyr/R/rectangle.R](https://github.com/tidyverse/tidyr/blob/v1.1.2/R/rectangle.R)",
Comments = "Defines a family of functions for \"rectangling\" nested lists (`hoist()` and the `unnest()` functions), all documented together in a big help topic, plus private helpers"
)
tidyr_uncount <- tibble(
`Organising principle` = "One function",
`Source file` = "[tidyr/R/uncount.R](https://github.com/tidyverse/tidyr/blob/v1.1.2/R/uncount.R)",
Comments = "Defines exactly one function, `uncount()`, that's not particulary large, but doesn't fit naturally into any other `.R` file"
)
dat <- bind_rows(tidyr_uncount, tidyr_separate, tidyr_rectangle)
```
```{r}
#| echo: false
#| label: tbl-putting-functions-in-files
#| tbl-cap: Different ways to organize functions in files.
knitr::kable(dat)
```
::: callout-tip
Another file you often see in the wild is `R/utils.R`.
This is a common place to define small utilities that are used inside multiple package functions.
Since they serve as helpers to multiple functions, placing them in `R/utils.R` makes them easier to re-discover when you return to your package after a long break.
Bob Rudis assembled a collection of such files and did some analysis in the post [Dissecting R Package "Utility Belts"](https://rud.is/b/2018/04/08/dissecting-r-package-utility-belts/).
:::
If it's very hard to predict which file a function lives in, that suggests it's time to separate your functions into more files or reconsider how you are naming your functions and/or files.
::: callout-tip
## RStudio
The organisation of functions within files is less important in RStudio, which offers two ways to jump to the definition of a function:
- Press Ctrl + `.` (the period) to bring up the *Go to File/Function* tool, as shown in @fig-go-to-file-function, then start typing the name.
Keep typing to narrow the list and eventually pick a function (or file) to visit.
This works for both functions and files in your project.
```{r}
#| label: fig-go-to-file-function
#| out-width: ~
#| echo: false
#| fig-cap: >
#| Go to File/Function in RStudio.
#| fig-alt: >
#| Screenshot of the "Go to File/Function" tool in the
#| RStudio IDE. It is a text box with the cursor in it.
knitr::include_graphics("images/file-finder.png", dpi = 220)
```
- With your cursor in a function name or with a function name selected, press F2.
This works for functions defined in your package or in another package.
After navigating to a function with one of these methods, return to where you started by clicking the back arrow at the top-left of the editor (![](images/arrows.png){width="33" height="16"}) or by pressing Ctrl + F9 (Windows & Linux) or Cmd + F9 (macOS).
:::
## Fast feedback via `load_all()` {#sec-code-load-all}
As you add or modify functions defined in files below `R/`, you will naturally want to try them out.
We want to reiterate our strong recommendation to use `devtools::load_all()` to make them available for interactive exploration instead of, for example, `source()`ing files below `R/`.
The main coverage of `load_all()` is in @sec-workflow101-load-all and `load_all()` also shows up as one of the natural development tasks in @sec-whole-game-load-all.
The importance of `load_all()` in the testthat workflow is explained in @sec-testing-design-tension.
Compared to the alternatives, `load_all()` helps you to iterate more quickly and provides an excellent approximation to the namespace regime of an installed package.
## Code style
We recommend following the tidyverse style guide (<https://style.tidyverse.org>), which goes into much more detail than we can here.
Its format also allows it to be a more dynamic document than this book.
Although the style guide explains the "what" and the "why", another important decision is *how* to enforce a specific code style.
For this we recommend the styler package (<https://styler.r-lib.org>); its default behaviour enforces the tidyverse style guide.
There are many ways to apply styler to your code, depending on the context:
- `styler::style_pkg()` restyles an entire R package.
- `styler::style_dir()` restyles all files in a directory.
- `usethis::use_tidy_style()` is wrapper that applies one of the above functions depending on whether the current project is an R package or not.
- `styler::style_file()` restyles a single file.
- `styler::style_text()` restyles a character vector.
::: callout-tip
## RStudio
When styler is installed, the RStudio Addins menu will offer several additional ways to style code:
- the active selection
- the active file
- the active package
:::
::: callout-warning
If you don't use Git or another version control system, applying a function like `styler::style_pkg()` is nerve-wracking and somewhat dangerous, because you lack a way to see exactly what changed and to accept/reject such changes in a granular way.
:::
The styler package can also be integrated with various platforms for hosting source code and doing continuous integration.
For example, the tidyverse packages use a GitHub Action that restyles a package when triggered by a special comment (`/style`) on a pull request.
This allows maintainers to focus on reviewing the substance of the pull request, without having to nitpick small issues of whitespace or indentation[^code-2][^code-3]
.
[^code-2]: See the [Commands workflow](https://github.com/r-lib/actions/tree/master/examples#commands-workflow) in the [GitHub Actions for the R language](https://github.com/r-lib/actions) repository.
[^code-3]: The [Robot Pedantry, Human Empathy](https://mikemcquaid.com/2018/06/05/robot-pedantry-human-empathy/) blog post by Mike McQuaid does an excellent job summarizing the benefit of automating tasks like code re-styling.
## Understand when code is executed {#sec-code-when-executed}
Up until now, you've probably been writing **scripts**, R code saved in a file that you execute interactively, perhaps using an IDE and/or `source()`, or noninteractively via `Rscript`.
There are two main differences between code in scripts and packages:
- In a script, code is run ... when you run it!
The awkwardness of this statement reflects that it's hard to even think about this issue with a script.
However, we must, in order to appreciate that the code in a package is run **when the package is built**.
This has big implications for how you write the code below `R/`: package code should only create objects, the vast majority of which will be functions.
- Functions in your package will be used in situations that you didn't imagine.
This means your functions need to be thoughtful in the way that they interact with the outside world.
We expand on the first point here and the second in the next section.
These topics are also illustrated concretely in @sec-package-within-build-time-run-time.
When you `source()` a script, every line of code is executed and the results are immediately made available.
Things are different with package code, because it is loaded in two steps.
When the binary package is built (often, by CRAN) all the code in `R/` is executed and the results are saved.
When you attach a package with `library()`, these cached results are re-loaded and certain objects (mostly functions) are made available for your use.
The full details on what it means for a package to be in binary form are given in @sec-structure-binary.
We refer to the creation of the binary package as (binary) "build time" and, specifically, we mean when `R CMD INSTALL --build` is run.
(You might think that this is what `R CMD build` does, but that actually makes a bundled package, a.k.a. a "source tarball".) For macOS and Windows users of CRAN packages, build time is whenever CRAN built the binary package for their OS.
For those who install packages from source, build time is essentially when they (built and) installed the package.
Consider the assignment `x <- Sys.time()`.
If you put this in a script, `x` tells you when the script was `source()`d.
But if you put that same code at the top-level in a package, `x` tells you when the package binary was *built*.
In @sec-package-within-build-time-run-time, we show a complete example of this in the context of forming timestamps inside a package.
The main takeaway is this:
> Any R code outside of a function is suspicious and should be carefully reviewed.
We explore a few real-world examples below that show how easy it is to get burned by this "build time vs. load time" issue.
Luckily, once you diagnose this problem, it is generally not difficult to fix.
### Example: A path returned by `system.file()`
The shinybootstrap2 package once had this code below `R/`:
```{r eval = FALSE}
dataTableDependency <- list(
htmlDependency(
"datatables", "1.10.2",
c(file = system.file("www/datatables", package = "shinybootstrap2")),
script = "js/jquery.dataTables.min.js"
),
htmlDependency(
"datatables-bootstrap", "1.10.2",
c(file = system.file("www/datatables", package = "shinybootstrap2")),
stylesheet = c("css/dataTables.bootstrap.css", "css/dataTables.extra.css"),
script = "js/dataTables.bootstrap.js"
)
)
```
So `dataTableDependency` was a list object defined in top-level package code and its value was constructed from paths obtained via `system.file()`.
As described in [a GitHub issue](https://github.com/rstudio/htmltools/issues/22),
> This works fine when the package is built and tested on the same machine.
> However, if the package is built on one machine and then used on another (as is the case with CRAN binary packages), then this will fail -- the dependency will point to the wrong directory on the host.
The heart of the solution is to make sure that `system.file()` is called from a function, at run time.
Indeed, this fix was made here (in commit [138db47](https://github.com/rstudio/shinybootstrap2/commit/138db47e6bef195f14f6a14f4289ca445e9b2efa#diff-fedcc5cc99f3d44a4caf06f8e6e0ae08)) and in a few other packages that had similar code and a related check was added in `htmlDependency()` itself.
This particular problem would now be caught by `R CMD check`, due to changes that came with [staged installation](https://developer.r-project.org/Blog/public/2019/02/14/staged-install/index.html) as of R 3.6.0.
### Example: Available colours
The crayon package has a function, `crayon::show_ansi_colors()`, that displays an ANSI colour table on your screen, basically to show what sort of styling is possible.
In an early version, the function looked something like this:
```{r eval = FALSE}
show_ansi_colors <- function(colors = num_colors()) {
if (colors < 8) {
cat("Colors are not supported")
} else if (colors < 256) {
cat(ansi_colors_8, sep = "")
invisible(ansi_colors_8)
} else {
cat(ansi_colors_256, sep = "")
invisible(ansi_colors_256)
}
}
ansi_colors_8 <- # code to generate a vector covering basic terminal colors
ansi_colors_256 <- # code to generate a vector covering 256 colors
```
where `ansi_colors_8` and `ansi_colors_256` were character vectors exploring a certain set of colours, presumably styled via ANSI escapes.
The problem was those objects were formed and cached when the binary package was built.
Since that often happens on a headless server, this likely happens under conditions where terminal colours might not be enabled or even available.
Users of the installed package could still call `show_ansi_colors()` and `num_colors()` would detect the number of colours supported by their system (256 on most modern computers).
But then an un-coloured object would print to screen (the original GitHub issue is [r-lib/crayon#37](https://github.com/r-lib/crayon/issues/37)).
The solution was to compute the display objects with a function at run time (in commit [e2b368a](https://github.com/r-lib/crayon/commit/e2b368ac27331d82154f85299f18efbc36227caa):
```{r}
show_ansi_colors <- function(colors = num_colors()) {
if (colors < 8) {
cat("Colors are not supported")
} else if (colors < 256) {
cat(ansi_colors_8(), sep = "")
invisible(ansi_colors_8())
} else {
cat(ansi_colors_256(), sep = "")
invisible(ansi_colors_256())
}
}
ansi_colors_8 <- function() {
# code to generate a vector covering basic terminal colors
}
ansi_colors_256 <- function() {
# code to generate a vector covering 256 colors
}
```
Literally, the same code is used, it is simply pushed down into the body of a function taking no arguments (similar to the shinybootstrap2 example).
Each reference to, e.g., the `ansi_colors_8` object is replaced by a call to the `ansi_colors_8()` function.
The main takeaway is that functions that assess or expose the capabilities of your package on a user's system must fully execute on your user's system.
It's fairly easy to accidentally rely on results that were cached at build time, quite possibly on a different machine.
### Example: Aliasing a function
One last example shows that, even if you are careful to only define functions below `R/`, there are still some subtleties to consider.
Imagine that you want the function `foo()` in your package to basically be an alias for the function `blah()` from some other package, e.g. pkgB.
You might be tempted to do this:
```{r eval = FALSE}
foo <- pkgB::blah
```
However, this will cause `foo()` in your package to reflect the definition of `pkgB::blah()` at the version present on the machine where the binary package is built (often CRAN), at that moment in time.
If a bug is discovered in `pkgB::blah()` and subsequently fixed, your package will still use the older, buggy version, until your package is rebuilt (often by CRAN) and your users upgrade, which is completely out of your control.
This alternative approach protects you from this:
```{r}
foo <- function(...) pkgB::blah(...)
```
Now, when your user calls `foo()`, they are effectively calling `pkgB::blah()`, at the version installed on *their* machine at that very moment.
A real example of this affected an older version of knitr, related to how the default "evaluate" hook was being set to `evaluate::evaluate()` (original issue is [yihui/knitr#1441](https://github.com/yihui/knitr/issues/1441), resolved in commit [d6b53e0](https://github.com/yihui/knitr/commit/d6b53e0f15a8afd1de4987a86931ba54f886278d)).
## Respect the R landscape {#sec-code-r-landscape}
Another big difference between a script and a package is that other people are going to use your package, and they're going to use it in situations that you never imagined.
This means you need to pay attention to the R landscape, which includes not just the available functions and objects, but all the global settings.
You have changed the R landscape if you've loaded a package with `library()`, or changed a global option with `options()`, or modified the working directory with `setwd()`.
If the behaviour of *other* functions differs before and after running your function, you've modified the landscape.
@sec-package-within-side-effects has a concrete example of this involving time zones and the locale-specific printing of datetimes.
Changing the landscape is bad because it makes code much harder to understand.
There are some functions that modify global settings that you should never use because there are better alternatives:
- **Don't use `library()` or `require()`**.
These modify the search path, affecting what functions are available from the global environment.
Instead, you should use the `DESCRIPTION` to specify your package's requirements, as described in @sec-description.
This also makes sure those packages are installed when your package is installed.
- **Never use `source()`** to load code from a file.
`source()` modifies the current environment, inserting the results of executing the code.
There is no reason to use `source()` inside your package, i.e. in a file below `R/`.
Sometimes people `source()` files below `R/` during package development, but as we've explained in @sec-workflow101-load-all and @sec-code-load-all, `load_all()` is a much better way to load your current code for exploration.
If you're using `source()` to create a dataset, it is better to use the methods in @sec-data for including data in a package.
Here is a non-exhaustive list of other functions that should be used with caution:
- `options()`
- `par()`
- `setwd()`
- `Sys.setenv()`
- `Sys.setlocale()`
- `set.seed()` (or anything that changes the state of the random number generator)
<!-- TODO: maybe tell one or more of the scary stories in <https://github.com/hadley/r-pkgs/issues/447> -->
If you must use them, make sure to clean up after yourself.
Below we show how to do this using functions from the withr package and in base R.
The flip side of this coin is that you should avoid relying on the user's landscape, which might be different to yours.
For example, functions that rely on sorting strings are dangerous, because sort order depends on the system locale.
Below we see that locales one might actually encounter in practice (C, English, French, etc.) differ in how they sort non-ASCII strings or uppercase versus lowercase letters.
```{r}
x <- c("bernard", "bérénice", "béatrice", "boris")
withr::with_locale(c(LC_COLLATE = "fr_FR"), sort(x))
withr::with_locale(c(LC_COLLATE = "C"), sort(x))
x <- c("a", "A", "B", "b", "A", "b")
withr::with_locale(c(LC_COLLATE = "en_CA"), sort(x))
withr::with_locale(c(LC_COLLATE = "C"), sort(x))
```
If you write your functions as if all users have the same system locale as you, your code might fail.
### Manage state with withr
If you need to modify the R landscape inside a function, then it is important to ensure your change is reversed *on exit* of that function.
This is exactly what `base::on.exit()` is designed to do.
You use `on.exit()` inside a function to register code to run later, that restores the landscape to its original state.
It is important to note that proper tools, such as `on.exit()`, work even if we exit the function abnormally, i.e. due to an error.
This is why it's worth using the official methods described here over any do-it-yourself solution.
We usually manage state using the [withr package](https://withr.r-lib.org), which provides a flexible, `on.exit()`-like toolkit (`on.exit()` itself is covered in the next section).
`withr::defer()` can be used as a drop-in replacement for `on.exit()`.
Why do we like withr so much?
First, it offers many pre-built convenience functions for state changes that come up often.
We also appreciate withr's default stack-like behaviour (LIFO = last in, first out), its usability in interactive sessions, and its `envir` argument (in more advanced usage).
The general pattern is to capture the original state, schedule its eventual restoration "on exit", then make the state change.
Some setters, such as `options()` or `par()`, return the old value when you provide a new value, leading to usage that looks like this:
```{r, eval = FALSE}
f <- function(x, y, z) {
... # width option "as found"
old <- options(width = 20) # width option is 20
defer(options(old)) # width option is 20
... # width option is 20
} # original width option restored
```
Certain state changes, such as modifying session options, come up so often that withr offers pre-made helpers.
@tbl-withr-greatest-hits shows a few of the state change helpers in withr that you are most likely to find useful:
| Do / undo this | withr functions |
|-----------------------------|-------------------------------------|
| Set an R option | `with_options()`, `local_options()` |
| Set an environment variable | `with_envvar()`, `local_envvar()` |
| Change working directory | `with_dir()`, `local_dir()` |
| Set a graphics parameter | `with_par()`, `local_par()` |
: Selected functions from withr. {#tbl-withr-greatest-hits}
You'll notice each helper comes in two forms that are useful in different situations:
- `with_*()` functions are best for executing small snippets of code with a temporarily modified state.
(These functions are inspired by how `base::with()` works.)
```{r eval = FALSE}
f <- function(x, sig_digits) {
# imagine lots of code here
withr::with_options(
list(digits = sig_digits),
print(x)
)
# ... and a lot more code here
}
```
- `local_*()` functions are best for modifying state "from now until the function exits".
```{r eval = FALSE}
g <- function(x, sig_digits) {
withr::local_options(list(digits = sig_digits))
print(x)
# imagine lots of code here
}
```
Developing code interactively with withr is pleasant, because deferred actions can be scheduled even on the global environment.
Those cleanup actions can then be executed with `withr::deferred_run()` or cleared without execution with `withr::deferred_clear()`.
Without this feature, it can be tricky to experiment with code that needs cleanup "on exit", because it behaves so differently when executed in the console versus at arm's length inside a function.
More in-depth coverage is given in the withr vignette [Changing and restoring state](https://withr.r-lib.org/articles/changing-and-restoring-state.html) and withr will also prove useful when we talk about testing in @sec-testing-basics.
### Restore state with `base::on.exit()`
Here is how the general "save, schedule restoration, change" pattern looks when using `base::on.exit()`.
```{r, eval = FALSE}
f <- function(x, y, z) {
...
old <- options(mfrow = c(2, 2), pty = "s")
on.exit(options(old), add = TRUE)
...
}
```
Other state changes aren't available with that sort of setter and you must implement it yourself.
```{r, eval = FALSE}
g <- function(a, b, c) {
...
scratch_file <- tempfile()
on.exit(unlink(scratch_file), add = TRUE)
file.create(scratch_file)
...
}
```
Note that we specify `on.exit(..., add = TRUE)`, because you almost always want this behaviour, i.e. to *add* to the list of deferred cleanup tasks rather than to *replace* them entirely.
This (and the default value of `after`) are related to our preference for `withr::defer()`, when we're willing to take a dependency on withr.
These issues are explored in a [withr vignette](https://withr.r-lib.org/articles/changing-and-restoring-state.html).
### Isolate side effects
Creating plots and printing output to the console are two other ways of affecting the global R environment.
Often you can't avoid these (because they're important!) but it's good practice to isolate them in functions that **only** produce output.
This also makes it easier for other people to repurpose your work for new uses.
For example, if you separate data preparation and plotting into two functions, others can use your data prep work (which is often the hardest part!) to create new visualisations.
### When you **do** need side-effects {#sec-code-onLoad-onAttach}
Occasionally, packages do need side-effects.
This is most common if your package talks to an external system --- you might need to do some initial setup when the package loads.
To do that, you can use two special functions: `.onLoad()` and `.onAttach()`.
These are called when the package is loaded and attached.
You'll learn about the distinction between the two in @sec-dependencies-attach-vs-load.
For now, you should always use `.onLoad()` unless explicitly directed otherwise.
Some common uses of `.onLoad()` and `.onAttach()` are:
- To set custom options for your package with `options()`.
To avoid conflicts with other packages, ensure that you prefix option names with the name of your package.
Also be careful not to override options that the user has already set.
Here's a (highly redacted) version of dplyr's `.onLoad()` function which sets an option that controls progress reporting:
```{r, eval = FALSE}
.onLoad <- function(libname, pkgname) {
op <- options()
op.dplyr <- list(
dplyr.show_progress = TRUE
)
toset <- !(names(op.dplyr) %in% names(op))
if (any(toset)) options(op.dplyr[toset])
invisible()
}
```
This allows functions in dplyr to use `getOption("dplyr.show_progress")` to determine whether to show progress bars, relying on the fact that a sensible default value has already been set.
<!-- TODO: I don't think the option setting example is so great anymore. I lean towards replacement. Other possible examples of `.onLoad()` usage: Register S3 methods. Detect capabilities of the system or of other packages. -->
- To display an informative message when the package is attached.
This might make usage conditions clear or display package capabilities based on current system conditions.
Startup messages are one place where you should use `.onAttach()` instead of `.onLoad()`.
To display startup messages, always use `packageStartupMessage()`, and not `message()`.
(This allows `suppressPackageStartupMessages()` to selectively suppress package startup messages).
```{r, eval = FALSE}
.onAttach <- function(libname, pkgname) {
packageStartupMessage("Welcome to my package")
}
```
As you can see in the examples, `.onLoad()` and `.onAttach()` are called with two arguments: `libname` and `pkgname`.
They're rarely used (they're a holdover from the days when you needed to use `library.dynam()` to load compiled code).
They give the path where the package is installed (the "library"), and the name of the package.
If you use `.onLoad()`, consider using `.onUnload()` to clean up any side effects.
By convention, `.onLoad()` and friends are usually saved in a file called `R/zzz.R`.
(Note that `.First.lib()` and `.Last.lib()` are old versions of `.onLoad()` and `.onUnload()` and should no longer be used.)
One especially hairy thing to do in a function like `.onLoad()` or `.onAttach()` is to change the state of the random number generator.
Once upon a time, ggplot2 used `sample()` when deciding whether to show a startup message, but only in interactive sessions.
This, in turn, created a reproducibility puzzle for users who were using `set.seed()` for their own purposes, prior to attaching ggplot2 with `library(ggplot2)`, and running the code both interactively and noninteractively.
The chosen solution was to wrap the offending startup code inside `withr::with_preserve_seed()`, which leaves the user's random seed as it found it.
## Constant health checks
Here is a typical sequence of calls when using devtools for package development:
1. Edit one or more files below `R/`.
2. `document()` (if you've made any changes that impact help files or NAMESPACE)
3. `load_all()`
4. Run some examples interactively.
5. `test()` (or `test_active_file()`)
6. `check()`
An interesting question is how frequently and rapidly you move through this development cycle.
We often find ourselves running through the above sequence several times in an hour or in a day while adding or modifying a single function.
Those newer to package development might be most comfortable slinging R code and much less comfortable writing and compiling documentation, simulating package build & installation, testing, and running `R CMD check`.
And it is human nature to embrace the familiar and postpone the unfamiliar.
This often leads to a dysfunctional workflow where the full sequence above unfolds infrequently, maybe once per month or every couple of months, very slowly and often with great pain:
1. Edit one or more files below `R/`.
2. Build, install, and use the package. Iterate occasionally with previous step.
3. Write documentation (once the code is "done").
4. Write tests (once the code is "done").
5. Run `R CMD check` right before submitting to CRAN or releasing in some other way.
We've already talked about the value of fast feedback, in the context of `load_all()`.
But this also applies to running `document()`, `test()`, and `check()`.
There are defects you just can't detect from using `load_all()` and running a few interactive examples that are immediately revealed by more formal checks.
Finding and fixing 5 bugs, one at a time, right after you created each one is much easier than troubleshooting all 5 at once (possibly interacting with each other), weeks or months after you last touched the code.
::: callout-warning
## Submitting to CRAN
If you're planning on submitting your package to CRAN, you must use only ASCII characters in your `.R` files.
In practice, this means you are limited to the digits 0 to 9, lowercase letters 'a' to 'z', uppercase letters 'A' to 'Z', and common punctuation.
But sometimes you need to inline a small bit of character data that includes, e.g., a Greek letter (µ), an accented character (ü), or a symbol (30°).
You can use any Unicode character as long as you specify it in the special Unicode escape `"\u1234"` format.
The easiest way to find the correct code point is to use `stringi::stri_escape_unicode()`:
```{r}
x <- "This is a bullet •"
y <- "This is a bullet \u2022"
identical(x, y)
cat(stringi::stri_escape_unicode(x))
```
Sometimes you have the opposite problem.
You don't *intentionally* have any non-ASCII characters in your R code, but automated checks reveal that you do.
```
W checking R files for non-ASCII characters ...
Found the following file with non-ASCII characters:
foo.R
Portable packages must use only ASCII characters in their R code,
except perhaps in comments.
Use \uxxxx escapes for other characters.
```
The most common offenders are "curly" or "smart" single and double quotes that sneak in through copy/paste.
The functions `tools::showNonASCII()` and `tools::showNonASCIIfile(file)` help you find the offending file(s) and line(s).
```{r eval = FALSE}
tools::showNonASCIIfile("R/foo.R")
#> 666: #' If you<e2><80><99>ve copy/pasted quotes, watch out!
```
:::