forked from PsyTeachR/ads-v2
-
Notifications
You must be signed in to change notification settings - Fork 2
/
app-dates.qmd
262 lines (183 loc) · 7.93 KB
/
app-dates.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
# Dates and Times {#sec-dates-times}
Working with dates and times can be a little tricky, but the <pkg>lubridate", "https://lubridate.tidyverse.org/")` package is there to help. Their website has a helpful [cheatsheet](https://rawgit.com/rstudio/cheatsheets/main/lubridate.pdf) and you can view a tutorial by typing `vignette("lubridate")` in the console pane. The [Dates and Times](https://r4ds.had.co.nz/dates-and-times.html){target="_blank"} in R for Data Science also gives a helpful overview.
This appendix is a quick intro to some of the most useful functions for making reproducible reports.
```{r setup, message=FALSE}
# packages needed for this appendix
library(tidyverse)
library(lubridate)
```
## Formats
While there is only one correct way to write date (The ISO 8601 format of "YYYY-MM-DD"), dates can be found in many formats. When you are reading a data file, you might need to specify the date format so it can be read properly. Date format specification uses abbreviations to represent the different ways people can write. the year, month, and day (as well as hours, minutes, and seconds). For example, the date `2023-01-03` is represented by the formatting string `"%Y-%m-%d`. The fastest way to find the list of formatting abbreviations is to look in the help for the function `col_date()`.
```{r, filename = "Run in the console"}
?col_date
```
```{r}
# create a table with some different date formats
date_formats <- tibble(
best = "2022-01-03",
ok = "2022 January 3",
bad = "January 3, 2022",
terrible = "Mon is 3 22 1"
)
# save it as a CSV file
write_csv(date_formats, "data/date_formats.csv")
# read it in
df <- read_csv("data/date_formats.csv")
```
You can see that only the first column read as a date, and the rest read as characters. You can set the date format using the `col_types` argument and two helper functions, `cols()` and `col_date()`.
```{r}
ct <- cols(ok = col_date("%Y %B %d"),
bad = col_date("%B %d, %Y"),
terrible = col_date("%a is %m %y %d"))
read_csv("data/date_formats.csv",
col_types = ct)
```
## Parsing
The `ymd` functions can deal with almost all date formats, regardless of the punctuation used in the format. All of the examples below produce a date in the standard format "2022-01-03".
```{r ymd, results='hide'}
# year-month-day orders
ymd("22 Jan 3")
ymd("2022 January 3rd")
# month-day-year orders
mdy("January 3, 2022")
mdy("Jan/03/22")
# day-month-year orders
dmy("3JAN22")
dmy("3rd of January in the year 2022")
```
::: {.callout-note .try}
See if you can make a date format that one of the parsers *can't* handle.
:::
There are similar functions for date/times, too.
```{r ymd-hms}
ymd_hms("2022 Jan 3, 6:05 and 20s pm")
mdy_h("January 3rd, 2022 at 6pm")
```
The date/time functions can also take a timezone argument. If you don't specify it, it defaults to "UTC".
```{r tz}
ymd_hm("2022-01-03 18:05", tz = "GMT")
```
## Get Parts
You frequently need to extract parts of a date/time for plotting. The following functions extract specific parts of a date or datetime object. This is a godsend for those of us who never have a clue what week of the year it is today.
```{r}
# get the date and time when this function is run
now <- now(tzone = "GMT")
# get separate parts
time_parts <- list(
second = second(now),
minute = minute(now),
hour = hour(now),
day = day(now), # day of the month (same as mday())
wday = wday(now), # day of the week
yday = yday(now), # day of the year
week = week(now),
isoweek = isoweek(now), # ISO 8501 week calendar (Monday start)
epiweek = epiweek(now), # CDC epidemiological week (Sunday Start)
month = month(now),
year = year(now),
tz = tz(now)
)
str(time_parts)
```
The `month()` and `wday()` functions can return factor labels.
```{r}
jan1 <- ymd(20220101)
wday(jan1, label = TRUE)
wday(jan1, label = TRUE, abbr = TRUE)
month(jan1, label = TRUE)
month(jan1, label = TRUE, abbr = TRUE)
```
::: {.callout-note .try}
What day of the week were you born?
```{r, webex.hide = TRUE}
birthdate <- ymd(19761118) # put your own birthdate here
wday(birthdate, label = TRUE)
```
:::
## Date Arithmetic
You can add and subtract dates. For example, you can get the dates two weeks from today by adding `weeks(2)` to `today()`. You can probably guess how to add and subtract seconds, minutes, days, months, and years.
```{r}
today() + weeks(1)
```
::: {.callout-note .try}
What day of the week will your 100th birthday be?
```{r, webex.hide = TRUE}
birthdate <- ymd(19761118) # put your own birthdate here
centennial <- birthdate + years(100)
wday(centennial, label = TRUE, abbr = FALSE)
```
:::
::: {.callout-warning}
What do you think will happen if you subtract one month from March 31st? You get NA, since February doesn't have a 31st day.
```{r}
ymd(20220331) - months(1)
```
Use the special date operators `%m+%` and `%m-%` to add and subtract months without risking an impossible date.
```{r}
ymd(20220331) %m-% months(1)
```
:::
### First and last of month
For things like billing, you might need to find the first or last days of the current, previous, or next month. The `rollback()` and `rollforward()` functions are easier than trying to parse dates.
```{r}
d <- ymd("2022-01-24")
rollback(d) # last day of the previous month
rollforward(d) # last day of the current month
rollback(d, roll_to_first = TRUE) # first day of the current month
rollforward(d, roll_to_first = TRUE) # first day of the next month
```
### Rounding
You can round dates and times to the nearest unit. This can be useful when you have, for example, time measured to the nearest second, but want to group data by the nearest hour, rather than extract the hour component.
```{r}
ymd_hm("2022-01-24 10:25") %>% round_date(unit = "hour")
ymd_hm("2022-01-24 10:30") %>% round_date(unit = "hour")
ymd_hm("2022-01-24 10:35") %>% round_date(unit = "hour")
```
## Internationalisation
You may need to work with dates from a different locale than your computer's defaults, such as dates written in French or Russian. Or your computer may have a non-English locale. Set the `locale` argument to the relevant language code.
```{r, error = TRUE}
ymd("2022 January 24", locale = "en_GB")
ymd("2022 Janvier 24", locale = "fr_FR")
wday("2022-01-03", label = TRUE, locale = "ru_RU")
```
Some of the locale functions only work on unix-based machines, like Macs or machines running linux.
```{r, error = TRUE}
# check your own locale; doesn't work for Windows
locale()
```
```{r, eval = FALSE}
# check which locales are available on your computer
# doesn't work for Windows
system("locale -a")
```
## Example
Let's work through some examples with downloaded tweets from the [class data](data/data.zip).
```{r, message=FALSE}
# read all metrics files in data/tweets/
tweets <- list.files(
path = "data/tweets",
pattern = "^tweet_activity_metrics",
full.names = TRUE
) %>%
map_df(read_csv) %>%
select(!starts_with("promoted"))
```
The `time` column is already in date/time (POSIXct) format, but what if we wanted to plot tweets by hour for each day of the week?
```{r weekday-tweets, fig.width = 12/2, fig.height = 2.5/2}
tweets %>%
mutate(weekday = wday(time, label = TRUE),
hour = hour(time)) %>%
ggplot(aes(x = hour, fill = weekday)) +
geom_bar(size = 1, alpha = 0.5, show.legend = FALSE) +
facet_grid(~weekday) +
scale_fill_manual(values = rainbow(7)) +
scale_x_continuous(breaks = seq(0, 24, 4))
```
A nice side-effect of using the lubridate function to get days of the week or months of the year is that the results are an ordered factor, so display correctly in a plot. Let's display the months in Greek (if that's available on your system).
```{r month-tweets, error = TRUE}
tweets %>%
mutate(month = month(time, label = TRUE, abbr = FALSE, locale = "el_GR.UTF-8")) %>%
ggplot(aes(x = month, fill = month)) +
geom_bar(show.legend = FALSE) +
scale_x_discrete(name = NULL, guide = guide_axis(n.dodge=2))
```