diff --git a/docs/404.html b/docs/404.html deleted file mode 100644 index d7ab004..0000000 --- a/docs/404.html +++ /dev/null @@ -1,124 +0,0 @@ - - -
- - - - -vignettes/SW00_Introduction_to_sweep.Rmd
- SW00_Introduction_to_sweep.Rmd
--Extending
-broom
to time series forecasting
The sweep
package extends the broom
tools
-(tidy, glance, and augment) for performing forecasts and time series
-analysis in the “tidyverse”. The package is geared towards the workflow
-required to perform forecasts using Rob Hyndman’s forecast
-package, and contains the following elements:
model tidiers: sw_tidy
,
-sw_glance
, sw_augment
,
-sw_tidy_decomp
functions extend tidy
,
-glance
, and augment
from the
-broom
package specifically for models (ets()
,
-Arima()
, bats()
, etc) used for
-forecasting.
forecast tidier: sw_sweep
converts
-a forecast
object to a tibble that can be easily
-manipulated in the “tidyverse”.
To illustrate, let’s take a basic forecasting workflow starting from -data collected in a tibble format and then performing a forecast to -achieve the end result in tibble format.
- -We’ll use the tidyquant
package to get the US alcohol
-sales, which comes from the FRED data base (the origin is the US Bureau
-of the Census, one of the 80+ data sources FRED connects to). The FRED
-code is “S4248SM144NCEN” and the data set can be found here.
-alcohol_sales_tbl <- tq_get("S4248SM144NCEN",
- get = "economic.data",
- from = "2007-01-01",
- to = "2016-12-31")
-alcohol_sales_tbl
## # A tibble: 120 × 3
-## symbol date price
-## <chr> <date> <int>
-## 1 S4248SM144NCEN 2007-01-01 6627
-## 2 S4248SM144NCEN 2007-02-01 6743
-## 3 S4248SM144NCEN 2007-03-01 8195
-## 4 S4248SM144NCEN 2007-04-01 7828
-## 5 S4248SM144NCEN 2007-05-01 9570
-## 6 S4248SM144NCEN 2007-06-01 9484
-## 7 S4248SM144NCEN 2007-07-01 8608
-## 8 S4248SM144NCEN 2007-08-01 9543
-## 9 S4248SM144NCEN 2007-09-01 8123
-## 10 S4248SM144NCEN 2007-10-01 9649
-## # ℹ 110 more rows
-We can quickly visualize using the ggplot2
package. We
-can see that there appears to be some seasonality and an upward
-trend.
-alcohol_sales_tbl %>%
- ggplot(aes(x = date, y = price)) +
- geom_line(linewidth = 1, color = palette_light()[[1]]) +
- geom_smooth(method = "loess") +
- labs(title = "US Alcohol Sales: Monthly", x = "", y = "Millions") +
- scale_y_continuous(labels = scales::dollar) +
- scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
- theme_tq()
## `geom_smooth()` using formula = 'y ~ x'
-
-The forecasting workflow involves a few basic steps:
-ts
object class.sw_sweep()
to tidy the forecast.Note that we purposely omit other steps such as testing the
-series for stationarity (Box.test(type = "Ljung")
) and
-analysis of autocorrelations (Acf
, Pacf
) for
-brevity purposes. We recommend the analyst to follow the forecasting
-workflow in “Forecasting: principles
-and practice”
ts
object class
-The forecast
package uses the ts
data
-structure, which is quite a bit different than tibbles that we are
-currently using. Fortunately, it’s easy to get to the correct structure
-with tk_ts()
from the timetk
package. The
-start
and freq
variables are required for the
-regularized time series (ts
) class, and these specify how
-to treat the time series. For monthly, the frequency should be specified
-as 12. This results in a nice calendar view. The
-silent = TRUE
tells the tk_ts()
function to
-skip the warning notifying us that the “date” column is being dropped.
-Non-numeric columns must be dropped for ts
class, which is
-matrix based and a homogeneous data class.
-alcohol_sales_ts <- tk_ts(alcohol_sales_tbl, start = 2007, freq = 12, silent = TRUE)
-alcohol_sales_ts
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
-## 2007 6627 6743 8195 7828 9570 9484 8608 9543 8123 9649 9390 10065
-## 2008 7093 7483 8365 8895 9794 9977 9553 9375 9225 9948 8758 10839
-## 2009 7266 7578 8688 9162 9369 10167 9507 8923 9272 9075 8949 10843
-## 2010 6558 7481 9475 9424 9351 10552 9077 9273 9420 9413 9866 11455
-## 2011 6901 8014 9832 9281 9967 11344 9106 10469 10085 9612 10328 11483
-## 2012 7486 8641 9709 9423 11342 11274 9845 11163 9532 10754 10953 11922
-## 2013 8383 8870 10085 10462 12177 11342 11139 11409 10442 11479 11077 12636
-## 2014 8506 9003 9991 10903 11709 11815 10875 10884 10725 11697 10353 13153
-## 2015 8279 8926 10557 10933 11330 12708 11700 11079 11882 11865 11420 14100
-## 2016 8556 10199 11949 11253 12046 13453 10755 12465 12038 11674 12761 14137
-A significant benefit is that the resulting ts
object
-maintains a “timetk index”, which will help with forecasting dates
-later. We can verify this using has_timetk_idx()
from the
-timetk
package.
-has_timetk_idx(alcohol_sales_ts)
## [1] TRUE
-Now that a time series has been coerced, let’s proceed with -modeling.
-The modeling workflow takes a time series object and applies a model.
-Nothing new here: we’ll simply use the ets()
function from
-the forecast
package to get an Exponential Smoothing ETS
-(Error, Trend, Seasonal) model.
Where sweep
can help is in the evaluation of a model.
-Expanding on the broom
package there are four
-functions:
sw_tidy()
: Returns a tibble of model parameterssw_glance()
: Returns the model accuracy
-measurementssw_augment()
: Returns the fitted and residuals of the
-modelsw_tidy_decomp()
: Returns a tidy decomposition from a
-modelThe guide below shows which model object compatibility with
-sweep
tidier functions.
Object | -sw_tidy() | -sw_glance() | -sw_augment() | -sw_tidy_decomp() | -sw_sweep() | -
---|---|---|---|---|---|
ar | -- | - | - | - | - |
arima | -X | -X | -X | -- | - |
Arima | -X | -X | -X | -- | - |
ets | -X | -X | -X | -X | -- |
baggedETS | -- | - | - | - | - |
bats | -X | -X | -X | -X | -- |
tbats | -X | -X | -X | -X | -- |
nnetar | -X | -X | -X | -- | - |
stl | -- | - | - | X | -- |
HoltWinters | -X | -X | -X | -X | -- |
StructTS | -X | -X | -X | -X | -- |
tslm | -X | -X | -X | -- | - |
decompose | -- | - | - | X | -- |
adf.test | -X | -X | -- | - | - |
Box.test | -X | -X | -- | - | - |
kpss.test | -X | -X | -- | - | - |
forecast | -- | - | - | - | X | -
Going through the tidiers, we can get useful model
-information.
sw_tidy()
returns the model parameters.
-sw_tidy(fit_ets)
## # A tibble: 17 × 2
-## term estimate
-## <chr> <dbl>
-## 1 alpha 0.159
-## 2 beta 0.0180
-## 3 gamma 0.000107
-## 4 phi 0.970
-## 5 l 8389.
-## 6 b 38.9
-## 7 s0 1.17
-## 8 s1 1.02
-## 9 s2 1.04
-## 10 s3 0.995
-## 11 s4 1.04
-## 12 s5 0.993
-## 13 s6 1.12
-## 14 s7 1.07
-## 15 s8 0.982
-## 16 s9 0.975
-## 17 s10 0.837
-sw_glance()
returns the model quality parameters.
-sw_glance(fit_ets)
## # A tibble: 1 × 12
-## model.desc sigma logLik AIC BIC ME RMSE MAE MPE MAPE MASE
-## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-## 1 ETS(M,Ad,M) 0.0458 -1012. 2060. 2111. 40.7 431. 357. 0.223 3.54 0.705
-## # ℹ 1 more variable: ACF1 <dbl>
-sw_augment()
returns the actual, fitted and residual
-values.
-augment_fit_ets <- sw_augment(fit_ets)
-augment_fit_ets
## # A tibble: 120 × 4
-## index .actual .fitted .resid
-## <yearmon> <dbl> <dbl> <dbl>
-## 1 Jan 2007 6627 6446. 0.0280
-## 2 Feb 2007 6743 7122. -0.0532
-## 3 Mar 2007 8195 8255. -0.00730
-## 4 Apr 2007 7828 8330. -0.0603
-## 5 May 2007 9570 8986. 0.0650
-## 6 Jun 2007 9484 9541. -0.00597
-## 7 Jul 2007 8608 8500. 0.0127
-## 8 Aug 2007 9543 8932. 0.0684
-## 9 Sep 2007 8123 8694. -0.0657
-## 10 Oct 2007 9649 8977. 0.0749
-## # ℹ 110 more rows
-We can review the residuals to determine if their are any underlying
-patterns left. Note that the index is class yearmon
, which
-is a regularized date format.
-augment_fit_ets %>%
- ggplot(aes(x = index, y = .resid)) +
- geom_hline(yintercept = 0, color = "grey40") +
- geom_point(color = palette_light()[[1]], alpha = 0.5) +
- geom_smooth(method = "loess") +
- scale_x_yearmon(n = 10) +
- labs(title = "US Alcohol Sales: ETS Residuals", x = "") +
- theme_tq()
## `geom_smooth()` using formula = 'y ~ x'
-
-sw_tidy_decomp()
returns the decomposition of the ETS
-model.
-decomp_fit_ets <- sw_tidy_decomp(fit_ets)
-decomp_fit_ets
## # A tibble: 121 × 5
-## index observed level slope season
-## <yearmon> <dbl> <dbl> <dbl> <dbl>
-## 1 Dec 2006 NA 8389. 38.9 1.17
-## 2 Jan 2007 6627 8464. 42.0 0.765
-## 3 Feb 2007 6743 8433. 32.6 0.837
-## 4 Mar 2007 8195 8455. 30.5 0.975
-## 5 Apr 2007 7828 8404. 20.4 0.982
-## 6 May 2007 9570 8510. 29.6 1.07
-## 7 Jun 2007 9484 8531. 27.8 1.12
-## 8 Jul 2007 8608 8575. 29.0 0.993
-## 9 Aug 2007 9543 8697. 38.7 1.04
-## 10 Sep 2007 8123 8643. 27.2 0.995
-## # ℹ 111 more rows
-We can review the decomposition using ggplot2
as well.
-The data will need to be manipulated slightly for the facet
-visualization. The gather()
function from the
-tidyr
package is used to reshape the data into a long
-format data frame with column names “key” and “value” indicating all
-columns except for index are to be reshaped. The “key” column is then
-mutated using mutate()
to a factor which preserves the
-order of the keys so “observed” comes first when plotting.
-decomp_fit_ets %>%
- tidyr::gather(key = key, value = value, -index) %>%
- dplyr::mutate(key = as.factor(key)) %>%
- ggplot(aes(x = index, y = value, group = key)) +
- geom_line(color = palette_light()[[2]]) +
- geom_ma(ma_fun = SMA, n = 12, size = 1) +
- facet_wrap(~ key, scales = "free_y") +
- scale_x_yearmon(n = 10) +
- labs(title = "US Alcohol Sales: ETS Decomposition", x = "") +
- theme_tq() +
- theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
-## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
-## This warning is displayed once every 8 hours.
-## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
-## generated.
-## Warning: Removed 1 row containing missing values (`geom_line()`).
-
-Under normal circumstances it would make sense to refine the model at -this point. However, in the interest of showing capabilities (rather -than how to forecast) we move onto forecasting the model. For more -information on how to forecast, please refer to the online book “Forecasting: principles and -practices”.
-Next we forecast the ETS model using the forecast()
-function. The returned forecast
object isn’t in a “tidy”
-format (i.e. data frame). This is where the sw_sweep()
-function helps.
We’ll use the sw_sweep()
function to coerce a
-forecast
into a “tidy” data frame. The
-sw_sweep()
function then coerces the forecast
-object into a tibble that can be sent to ggplot
for
-visualization. Let’s inspect the result.
-sw_sweep(fcast_ets, fitted = TRUE)
## # A tibble: 252 × 7
-## index key price lo.80 lo.95 hi.80 hi.95
-## <yearmon> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
-## 1 Jan 2007 actual 6627 NA NA NA NA
-## 2 Feb 2007 actual 6743 NA NA NA NA
-## 3 Mar 2007 actual 8195 NA NA NA NA
-## 4 Apr 2007 actual 7828 NA NA NA NA
-## 5 May 2007 actual 9570 NA NA NA NA
-## 6 Jun 2007 actual 9484 NA NA NA NA
-## 7 Jul 2007 actual 8608 NA NA NA NA
-## 8 Aug 2007 actual 9543 NA NA NA NA
-## 9 Sep 2007 actual 8123 NA NA NA NA
-## 10 Oct 2007 actual 9649 NA NA NA NA
-## # ℹ 242 more rows
-The tibble returned contains “index”, “key” and “value” (or in this
-case “price”) columns in a long or “tidy” format that is ideal for
-visualization with ggplot2
. The “index” is in a regularized
-format (in this case yearmon
) because the
-forecast
package uses ts
objects. We’ll see
-how we can get back to the original irregularized format (in this case
-date
) later. The “key” and “price” columns contains three
-groups of key-value pairs:
ets()
function (excluded by default)forecast()
functionThe sw_sweep()
function contains an argument
-fitted = FALSE
by default meaning that the model “fitted”
-values are not returned. We can toggle this on if desired. The remaining
-columns are the forecast confidence intervals (typically 80 and 95, but
-this can be changed with forecast(level = c(80, 95))
).
-These columns are setup in a wide format to enable using the
-geom_ribbon()
.
Let’s visualize the forecast with ggplot2
. We’ll use a
-combination of geom_line()
and geom_ribbon()
.
-The fitted values are toggled off by default to reduce the complexity of
-the plot, but these can be added if desired. Note that because we are
-using a regular time index of the yearmon
class, we need to
-add scale_x_yearmon()
.
-sw_sweep(fcast_ets) %>%
- ggplot(aes(x = index, y = price, color = key)) +
- geom_ribbon(aes(ymin = lo.95, ymax = hi.95),
- fill = "#D5DBFF", color = NA, linewidth = 0) +
- geom_ribbon(aes(ymin = lo.80, ymax = hi.80, fill = key),
- fill = "#596DD5", color = NA, linewidth = 0, alpha = 0.8) +
- geom_line(linewidth = 1) +
- labs(title = "US Alcohol Sales, ETS Model Forecast", x = "", y = "Millions",
- subtitle = "Regular Time Index") +
- scale_y_continuous(labels = scales::label_dollar()) +
- scale_x_yearmon(n = 12, format = "%Y") +
- scale_color_tq() +
- scale_fill_tq() +
- theme_tq()
Because the ts
object was created with the
-tk_ts()
function, it contained a timetk index that was
-carried with it throughout the forecasting workflow. As a result, we can
-use the timetk_idx
argument, which maps the original
-irregular index (dates) and a generated future index to the regularized
-time series (yearmon). This results in the ability to return an index of
-date and datetime, which is not currently possible with the
-forecast
objects. Notice that the index is returned as
-date
class.
## Warning in .check_tzones(e1, e2): 'tzone' attributes are inconsistent
-## # A tibble: 6 × 7
-## index key price lo.80 lo.95 hi.80 hi.95
-## <date> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
-## 1 2007-01-01 actual 6627 NA NA NA NA
-## 2 2007-02-01 actual 6743 NA NA NA NA
-## 3 2007-03-01 actual 8195 NA NA NA NA
-## 4 2007-04-01 actual 7828 NA NA NA NA
-## 5 2007-05-01 actual 9570 NA NA NA NA
-## 6 2007-06-01 actual 9484 NA NA NA NA
-
-## Warning in .check_tzones(e1, e2): 'tzone' attributes are inconsistent
-## # A tibble: 6 × 7
-## index key price lo.80 lo.95 hi.80 hi.95
-## <date> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
-## 1 2017-07-01 forecast 12117. 11309. 10882. 12924. 13351.
-## 2 2017-08-01 forecast 12697. 11828. 11367. 13566. 14027.
-## 3 2017-09-01 forecast 12203. 11343. 10888. 13063. 13518.
-## 4 2017-10-01 forecast 12723. 11800. 11311. 13647. 14136.
-## 5 2017-11-01 forecast 12559. 11619. 11122. 13499. 13996.
-## 6 2017-12-01 forecast 14499. 13380. 12788. 15618. 16211.
-We can build the same plot with dates in the x-axis now.
-
-sw_sweep(fcast_ets, timetk_idx = TRUE) %>%
- ggplot(aes(x = index, y = price, color = key)) +
- geom_ribbon(aes(ymin = lo.95, ymax = hi.95),
- fill = "#D5DBFF", color = NA, linewidth = 0) +
- geom_ribbon(aes(ymin = lo.80, ymax = hi.80, fill = key),
- fill = "#596DD5", color = NA, linewidth = 0, alpha = 0.8) +
- geom_line(linewidth = 1) +
- labs(title = "US Alcohol Sales, ETS Model Forecast", x = "", y = "Millions",
- subtitle = "Irregular Time Index") +
- scale_y_continuous(labels = scales::dollar) +
- scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
- scale_color_tq() +
- scale_fill_tq() +
- theme_tq()
## Warning in .check_tzones(e1, e2): 'tzone' attributes are inconsistent
-
-In this example, there is not much benefit to returning an irregular -time series. However, when working with frequencies below monthly, the -ability to return irregular index values becomes more apparent.
-vignettes/SW01_Forecasting_Time_Series_Groups.Rmd
- SW01_Forecasting_Time_Series_Groups.Rmd
--Extending
-broom
to time series forecasting
One of the most powerful benefits of sweep
is that it
-helps forecasting at scale within the “tidyverse”. There are two common
-situations:
In this vignette we’ll review how sweep
can help the
-first situation: Applying a model to groups of time
-series.
We’ll use the bike sales data set, bike_sales
, provided
-with the sweep
package for this tutorial. The
-bike_sales
data set is a fictional daily order
-history that spans 2011 through 2015. It simulates a sales database that
-is typical of a business. The customers are the “bike shops” and the
-products are the “models”.
-bike_sales
## # A tibble: 15,644 × 17
-## order.date order.id order.line quantity price price.ext customer.id
-## <date> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
-## 1 2011-01-07 1 1 1 6070 6070 2
-## 2 2011-01-07 1 2 1 5970 5970 2
-## 3 2011-01-10 2 1 1 2770 2770 10
-## 4 2011-01-10 2 2 1 5970 5970 10
-## 5 2011-01-10 3 1 1 10660 10660 6
-## 6 2011-01-10 3 2 1 3200 3200 6
-## 7 2011-01-10 3 3 1 12790 12790 6
-## 8 2011-01-10 3 4 1 5330 5330 6
-## 9 2011-01-10 3 5 1 1570 1570 6
-## 10 2011-01-11 4 1 1 4800 4800 22
-## # ℹ 15,634 more rows
-## # ℹ 10 more variables: bikeshop.name <chr>, bikeshop.city <chr>,
-## # bikeshop.state <chr>, latitude <dbl>, longitude <dbl>, product.id <dbl>,
-## # model <chr>, category.primary <chr>, category.secondary <chr>, frame <chr>
-We’ll analyse the monthly sales trends for the bicycle manufacturer. -Let’s transform the data set by aggregating by month.
-
-bike_sales_monthly <- bike_sales %>%
- mutate(month = month(order.date, label = TRUE),
- year = year(order.date)) %>%
- group_by(year, month) %>%
- summarise(total.qty = sum(quantity))
## `summarise()` has grouped output by 'year'. You can override using the
-## `.groups` argument.
-
-bike_sales_monthly
## # A tibble: 60 × 3
-## # Groups: year [5]
-## year month total.qty
-## <dbl> <ord> <dbl>
-## 1 2011 Jan 440
-## 2 2011 Feb 2017
-## 3 2011 Mar 1584
-## 4 2011 Apr 4478
-## 5 2011 May 4112
-## 6 2011 Jun 4251
-## 7 2011 Jul 1550
-## 8 2011 Aug 1470
-## 9 2011 Sep 975
-## 10 2011 Oct 697
-## # ℹ 50 more rows
-We can visualize package with a month plot using the
-ggplot2
.
-bike_sales_monthly %>%
- ggplot(aes(x = month, y = total.qty, group = year)) +
- geom_area(aes(fill = year), position = "stack") +
- labs(title = "Quantity Sold: Month Plot", x = "", y = "Sales",
- subtitle = "March through July tend to be most active") +
- scale_y_continuous() +
- theme_tq()
Suppose Manufacturing wants a more granular forecast because the bike
-components are related to the secondary category. In the next section we
-discuss how sweep
can help to perform a forecast on each
-sub-category.
First, we need to get the data organized into groups by month of the
-year. We’ll create a new “order.month” date using
-zoo::as.yearmon()
that captures the year and month
-information from the “order.date” and then passing this to
-lubridate::as_date()
to convert to date format.
-monthly_qty_by_cat2 <- bike_sales %>%
- mutate(order.month = as_date(as.yearmon(order.date))) %>%
- group_by(category.secondary, order.month) %>%
- summarise(total.qty = sum(quantity))
## `summarise()` has grouped output by 'category.secondary'. You can override
-## using the `.groups` argument.
-
-monthly_qty_by_cat2
## # A tibble: 538 × 3
-## # Groups: category.secondary [9]
-## category.secondary order.month total.qty
-## <chr> <date> <dbl>
-## 1 Cross Country Race 2011-01-01 122
-## 2 Cross Country Race 2011-02-01 489
-## 3 Cross Country Race 2011-03-01 505
-## 4 Cross Country Race 2011-04-01 343
-## 5 Cross Country Race 2011-05-01 263
-## 6 Cross Country Race 2011-06-01 735
-## 7 Cross Country Race 2011-07-01 183
-## 8 Cross Country Race 2011-08-01 66
-## 9 Cross Country Race 2011-09-01 97
-## 10 Cross Country Race 2011-10-01 189
-## # ℹ 528 more rows
-Next, we use the nest()
function from the
-tidyr
package to consolidate each time series by group. The
-newly created list-column, “data.tbl”, contains the “order.month” and
-“total.qty” columns by group from the previous step. The
-nest()
function just bundles the data together which is
-very useful for iterative functional programming.
-monthly_qty_by_cat2_nest <- monthly_qty_by_cat2 %>%
- group_by(category.secondary) %>%
- nest()
-monthly_qty_by_cat2_nest
## # A tibble: 9 × 2
-## # Groups: category.secondary [9]
-## category.secondary data
-## <chr> <list>
-## 1 Cross Country Race <tibble [60 × 2]>
-## 2 Cyclocross <tibble [60 × 2]>
-## 3 Elite Road <tibble [60 × 2]>
-## 4 Endurance Road <tibble [60 × 2]>
-## 5 Fat Bike <tibble [58 × 2]>
-## 6 Over Mountain <tibble [60 × 2]>
-## 7 Sport <tibble [60 × 2]>
-## 8 Trail <tibble [60 × 2]>
-## 9 Triathalon <tibble [60 × 2]>
-The forecasting workflow involves a few basic steps:
-ts
object class.ts
object class
-In this step we map the tk_ts()
function into a new
-column “data.ts”. The procedure is performed using the combination of
-dplyr::mutate()
and purrr::map()
, which works
-really well for the data science workflow where analyses are built
-progressively. As a result, this combination will be used in many of the
-subsequent steps in this vignette as we build the analysis.
The mutate()
function adds a column, and the
-map()
function maps the contents of a list-column
-(.x
) to a function (.f
). In our case,
-.x = data.tbl
and .f = tk_ts
. The arguments
-select = -order.month
, start = 2011
, and
-freq = 12
are passed to the ...
parameters in
-map, which are passed through to the function. The select
-statement is used to drop the “order.month” from the final output so we
-don’t get a bunch of warning messages. We specify
-start = 2011
and freq = 12
to return a monthly
-frequency.
-monthly_qty_by_cat2_ts <- monthly_qty_by_cat2_nest %>%
- mutate(data.ts = map(.x = data,
- .f = tk_ts,
- select = -order.month,
- start = 2011,
- freq = 12))
-monthly_qty_by_cat2_ts
## # A tibble: 9 × 3
-## # Groups: category.secondary [9]
-## category.secondary data data.ts
-## <chr> <list> <list>
-## 1 Cross Country Race <tibble [60 × 2]> <ts [60 × 1]>
-## 2 Cyclocross <tibble [60 × 2]> <ts [60 × 1]>
-## 3 Elite Road <tibble [60 × 2]> <ts [60 × 1]>
-## 4 Endurance Road <tibble [60 × 2]> <ts [60 × 1]>
-## 5 Fat Bike <tibble [58 × 2]> <ts [58 × 1]>
-## 6 Over Mountain <tibble [60 × 2]> <ts [60 × 1]>
-## 7 Sport <tibble [60 × 2]> <ts [60 × 1]>
-## 8 Trail <tibble [60 × 2]> <ts [60 × 1]>
-## 9 Triathalon <tibble [60 × 2]> <ts [60 × 1]>
-Next, we map the Exponential Smoothing ETS (Error, Trend, Seasonal)
-model function, ets
, from the forecast
-package. Use the combination of mutate
to add a column and
-map
to interatively apply a function rowwise to a
-list-column. In this instance, the function to map the ets
-function and the list-column is “data.ts”. We rename the resultant
-column “fit.ets” indicating an ETS model was fit to the time series
-data.
-monthly_qty_by_cat2_fit <- monthly_qty_by_cat2_ts %>%
- mutate(fit.ets = map(data.ts, ets))
-monthly_qty_by_cat2_fit
## # A tibble: 9 × 4
-## # Groups: category.secondary [9]
-## category.secondary data data.ts fit.ets
-## <chr> <list> <list> <list>
-## 1 Cross Country Race <tibble [60 × 2]> <ts [60 × 1]> <ets>
-## 2 Cyclocross <tibble [60 × 2]> <ts [60 × 1]> <ets>
-## 3 Elite Road <tibble [60 × 2]> <ts [60 × 1]> <ets>
-## 4 Endurance Road <tibble [60 × 2]> <ts [60 × 1]> <ets>
-## 5 Fat Bike <tibble [58 × 2]> <ts [58 × 1]> <ets>
-## 6 Over Mountain <tibble [60 × 2]> <ts [60 × 1]> <ets>
-## 7 Sport <tibble [60 × 2]> <ts [60 × 1]> <ets>
-## 8 Trail <tibble [60 × 2]> <ts [60 × 1]> <ets>
-## 9 Triathalon <tibble [60 × 2]> <ts [60 × 1]> <ets>
-At this point, we can do some model inspection with the
-sweep
tidiers.
To get the model parameters for each nested list, we can combine
-sw_tidy
within the mutate
and map
-combo. The only real difference is now we unnest
the
-generated column (named “tidy”). Last, because it’s easier to compare
-the model parameters side by side, we add one additional call to
-spread()
from the tidyr
package.
-monthly_qty_by_cat2_fit %>%
- mutate(tidy = map(fit.ets, sw_tidy)) %>%
- unnest(tidy) %>%
- spread(key = category.secondary, value = estimate)
## # A tibble: 128 × 13
-## data data.ts fit.ets term `Cross Country Race` Cyclocross `Elite Road`
-## <list> <list> <list> <chr> <dbl> <dbl> <dbl>
-## 1 <tibble> <ts[…]> <ets> alpha 0.0398 NA NA
-## 2 <tibble> <ts[…]> <ets> gamma 0.000101 NA NA
-## 3 <tibble> <ts[…]> <ets> l 321. NA NA
-## 4 <tibble> <ts[…]> <ets> s0 0.503 NA NA
-## 5 <tibble> <ts[…]> <ets> s1 1.10 NA NA
-## 6 <tibble> <ts[…]> <ets> s10 0.643 NA NA
-## 7 <tibble> <ts[…]> <ets> s2 0.375 NA NA
-## 8 <tibble> <ts[…]> <ets> s3 1.12 NA NA
-## 9 <tibble> <ts[…]> <ets> s4 0.630 NA NA
-## 10 <tibble> <ts[…]> <ets> s5 2.06 NA NA
-## # ℹ 118 more rows
-## # ℹ 6 more variables: `Endurance Road` <dbl>, `Fat Bike` <dbl>,
-## # `Over Mountain` <dbl>, Sport <dbl>, Trail <dbl>, Triathalon <dbl>
-We can view the model accuracies also by mapping
-sw_glance
within the mutate
and
-map
combo.
## # A tibble: 9 × 16
-## # Groups: category.secondary [9]
-## category.secondary data data.ts fit.ets model.desc sigma logLik AIC
-## <chr> <list> <list> <list> <chr> <dbl> <dbl> <dbl>
-## 1 Cross Country Race <tibble> <ts[…]> <ets> ETS(M,N,M) 1.06 -464. 957.
-## 2 Cyclocross <tibble> <ts[…]> <ets> ETS(M,N,M) 1.12 -409. 848.
-## 3 Elite Road <tibble> <ts[…]> <ets> ETS(M,N,M) 0.895 -471. 972.
-## 4 Endurance Road <tibble> <ts[…]> <ets> ETS(M,N,M) 0.759 -439. 909.
-## 5 Fat Bike <tibble> <ts[…]> <ets> ETS(M,N,M) 2.73 -343. 715.
-## 6 Over Mountain <tibble> <ts[…]> <ets> ETS(M,N,M) 0.910 -423. 877.
-## 7 Sport <tibble> <ts[…]> <ets> ETS(M,N,M) 0.872 -427. 884.
-## 8 Trail <tibble> <ts[…]> <ets> ETS(M,A,M) 0.741 -411. 855.
-## 9 Triathalon <tibble> <ts[…]> <ets> ETS(M,N,M) 1.52 -410. 850.
-## # ℹ 8 more variables: BIC <dbl>, ME <dbl>, RMSE <dbl>, MAE <dbl>, MPE <dbl>,
-## # MAPE <dbl>, MASE <dbl>, ACF1 <dbl>
-The augmented fitted and residual values can be achieved in much the
-same manner. This returns nine groups data. Note that we pass
-timetk_idx = TRUE
to return the date format times as
-opposed to the regular (yearmon or numeric) time series.
-augment_fit_ets <- monthly_qty_by_cat2_fit %>%
- mutate(augment = map(fit.ets, sw_augment, timetk_idx = TRUE, rename_index = "date")) %>%
- unnest(augment)
## Warning: There were 9 warnings in `mutate()`.
-## The first warning was:
-## ℹ In argument: `augment = map(fit.ets, sw_augment, timetk_idx = TRUE,
-## rename_index = "date")`.
-## ℹ In group 1: `category.secondary = "Cross Country Race"`.
-## Caused by warning in `.check_tzones()`:
-## ! 'tzone' attributes are inconsistent
-## ℹ Run `dplyr::last_dplyr_warnings()` to see the 8 remaining warnings.
-
-augment_fit_ets
## # A tibble: 538 × 8
-## # Groups: category.secondary [9]
-## category.secondary data data.ts fit.ets date .actual .fitted
-## <chr> <list> <list> <list> <date> <dbl> <dbl>
-## 1 Cross Country Race <tibble> <ts [60 × 1]> <ets> 2011-01-01 122 373.
-## 2 Cross Country Race <tibble> <ts [60 × 1]> <ets> 2011-02-01 489 201.
-## 3 Cross Country Race <tibble> <ts [60 × 1]> <ets> 2011-03-01 505 465.
-## 4 Cross Country Race <tibble> <ts [60 × 1]> <ets> 2011-04-01 343 161.
-## 5 Cross Country Race <tibble> <ts [60 × 1]> <ets> 2011-05-01 263 567.
-## 6 Cross Country Race <tibble> <ts [60 × 1]> <ets> 2011-06-01 735 296.
-## 7 Cross Country Race <tibble> <ts [60 × 1]> <ets> 2011-07-01 183 741.
-## 8 Cross Country Race <tibble> <ts [60 × 1]> <ets> 2011-08-01 66 220.
-## 9 Cross Country Race <tibble> <ts [60 × 1]> <ets> 2011-09-01 97 381.
-## 10 Cross Country Race <tibble> <ts [60 × 1]> <ets> 2011-10-01 189 123.
-## # ℹ 528 more rows
-## # ℹ 1 more variable: .resid <dbl>
-We can plot the residuals for the nine categories like so. -Unfortunately we do see some very high residuals (especially with “Fat -Bike”). This is often the case with realworld data.
-
-augment_fit_ets %>%
- ggplot(aes(x = date, y = .resid, group = category.secondary)) +
- geom_hline(yintercept = 0, color = "grey40") +
- geom_line(color = palette_light()[[2]]) +
- geom_smooth(method = "loess") +
- labs(title = "Bike Quantity Sold By Secondary Category",
- subtitle = "ETS Model Residuals", x = "") +
- theme_tq() +
- facet_wrap(~ category.secondary, scale = "free_y", ncol = 3) +
- scale_x_date(date_labels = "%Y")
## `geom_smooth()` using formula = 'y ~ x'
-
-We can create decompositions using the same procedure with
-sw_tidy_decomp()
and the mutate
and
-map
combo.
-monthly_qty_by_cat2_fit %>%
- mutate(decomp = map(fit.ets, sw_tidy_decomp, timetk_idx = TRUE, rename_index = "date")) %>%
- unnest(decomp)
## Warning: There were 9 warnings in `mutate()`.
-## The first warning was:
-## ℹ In argument: `decomp = map(fit.ets, sw_tidy_decomp, timetk_idx = TRUE,
-## rename_index = "date")`.
-## ℹ In group 1: `category.secondary = "Cross Country Race"`.
-## Caused by warning in `.check_tzones()`:
-## ! 'tzone' attributes are inconsistent
-## ℹ Run `dplyr::last_dplyr_warnings()` to see the 8 remaining warnings.
-## # A tibble: 538 × 9
-## # Groups: category.secondary [9]
-## category.secondary data data.ts fit.ets date observed level season
-## <chr> <list> <list> <list> <date> <dbl> <dbl> <dbl>
-## 1 Cross Country Race <tibble> <ts[…]> <ets> 2011-01-01 122 313. 1.16
-## 2 Cross Country Race <tibble> <ts[…]> <ets> 2011-02-01 489 331. 0.643
-## 3 Cross Country Race <tibble> <ts[…]> <ets> 2011-03-01 505 332. 1.41
-## 4 Cross Country Race <tibble> <ts[…]> <ets> 2011-04-01 343 347. 0.487
-## 5 Cross Country Race <tibble> <ts[…]> <ets> 2011-05-01 263 339. 1.64
-## 6 Cross Country Race <tibble> <ts[…]> <ets> 2011-06-01 735 359. 0.873
-## 7 Cross Country Race <tibble> <ts[…]> <ets> 2011-07-01 183 348. 2.06
-## 8 Cross Country Race <tibble> <ts[…]> <ets> 2011-08-01 66 339. 0.630
-## 9 Cross Country Race <tibble> <ts[…]> <ets> 2011-09-01 97 329. 1.12
-## 10 Cross Country Race <tibble> <ts[…]> <ets> 2011-10-01 189 336. 0.375
-## # ℹ 528 more rows
-## # ℹ 1 more variable: slope <dbl>
-We can also forecast the multiple models again using a very similar
-approach with the forecast
function. We want a 12 month
-forecast so we add the argument for the h = 12
(refer to
-?forecast
for all of the parameters you can add, there’s
-quite a few).
-monthly_qty_by_cat2_fcast <- monthly_qty_by_cat2_fit %>%
- mutate(fcast.ets = map(fit.ets, forecast, h = 12))
-monthly_qty_by_cat2_fcast
## # A tibble: 9 × 5
-## # Groups: category.secondary [9]
-## category.secondary data data.ts fit.ets fcast.ets
-## <chr> <list> <list> <list> <list>
-## 1 Cross Country Race <tibble [60 × 2]> <ts [60 × 1]> <ets> <forecast>
-## 2 Cyclocross <tibble [60 × 2]> <ts [60 × 1]> <ets> <forecast>
-## 3 Elite Road <tibble [60 × 2]> <ts [60 × 1]> <ets> <forecast>
-## 4 Endurance Road <tibble [60 × 2]> <ts [60 × 1]> <ets> <forecast>
-## 5 Fat Bike <tibble [58 × 2]> <ts [58 × 1]> <ets> <forecast>
-## 6 Over Mountain <tibble [60 × 2]> <ts [60 × 1]> <ets> <forecast>
-## 7 Sport <tibble [60 × 2]> <ts [60 × 1]> <ets> <forecast>
-## 8 Trail <tibble [60 × 2]> <ts [60 × 1]> <ets> <forecast>
-## 9 Triathalon <tibble [60 × 2]> <ts [60 × 1]> <ets> <forecast>
-Next, we can apply sw_sweep
to get the forecast in a
-nice “tidy” data frame. We use the argument fitted = FALSE
-to remove the fitted values from the forecast (leave off if fitted
-values are desired). We set timetk_idx = TRUE
to use dates
-instead of numeric values for the index. We’ll use unnest()
-to drop the left over list-columns and return an unnested data
-frame.
-monthly_qty_by_cat2_fcast_tidy <- monthly_qty_by_cat2_fcast %>%
- mutate(sweep = map(fcast.ets, sw_sweep, fitted = FALSE, timetk_idx = TRUE)) %>%
- unnest(sweep)
## Warning: There were 9 warnings in `mutate()`.
-## The first warning was:
-## ℹ In argument: `sweep = map(fcast.ets, sw_sweep, fitted = FALSE, timetk_idx =
-## TRUE)`.
-## ℹ In group 1: `category.secondary = "Cross Country Race"`.
-## Caused by warning in `.check_tzones()`:
-## ! 'tzone' attributes are inconsistent
-## ℹ Run `dplyr::last_dplyr_warnings()` to see the 8 remaining warnings.
-
-monthly_qty_by_cat2_fcast_tidy
## # A tibble: 646 × 12
-## # Groups: category.secondary [9]
-## category.secondary data data.ts fit.ets fcast.ets index key
-## <chr> <list> <list> <list> <list> <date> <chr>
-## 1 Cross Country Race <tibble> <ts [60 × 1]> <ets> <forecast> 2011-01-01 actu…
-## 2 Cross Country Race <tibble> <ts [60 × 1]> <ets> <forecast> 2011-02-01 actu…
-## 3 Cross Country Race <tibble> <ts [60 × 1]> <ets> <forecast> 2011-03-01 actu…
-## 4 Cross Country Race <tibble> <ts [60 × 1]> <ets> <forecast> 2011-04-01 actu…
-## 5 Cross Country Race <tibble> <ts [60 × 1]> <ets> <forecast> 2011-05-01 actu…
-## 6 Cross Country Race <tibble> <ts [60 × 1]> <ets> <forecast> 2011-06-01 actu…
-## 7 Cross Country Race <tibble> <ts [60 × 1]> <ets> <forecast> 2011-07-01 actu…
-## 8 Cross Country Race <tibble> <ts [60 × 1]> <ets> <forecast> 2011-08-01 actu…
-## 9 Cross Country Race <tibble> <ts [60 × 1]> <ets> <forecast> 2011-09-01 actu…
-## 10 Cross Country Race <tibble> <ts [60 × 1]> <ets> <forecast> 2011-10-01 actu…
-## # ℹ 636 more rows
-## # ℹ 5 more variables: total.qty <dbl>, lo.80 <dbl>, lo.95 <dbl>, hi.80 <dbl>,
-## # hi.95 <dbl>
-Visualization is just one final step.
-
-monthly_qty_by_cat2_fcast_tidy %>%
- ggplot(aes(x = index, y = total.qty, color = key, group = category.secondary)) +
- geom_ribbon(aes(ymin = lo.95, ymax = hi.95),
- fill = "#D5DBFF", color = NA, linewidth = 0) +
- geom_ribbon(aes(ymin = lo.80, ymax = hi.80, fill = key),
- fill = "#596DD5", color = NA, linewidth = 0, alpha = 0.8) +
- geom_line() +
- labs(title = "Bike Quantity Sold By Secondary Category",
- subtitle = "ETS Model Forecasts",
- x = "", y = "Units") +
- scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
- scale_color_tq() +
- scale_fill_tq() +
- facet_wrap(~ category.secondary, scales = "free_y", ncol = 3) +
- theme_tq() +
- theme(axis.text.x = element_text(angle = 45, hjust = 1))
vignettes/SW02_Forecasting_Multiple_Models.Rmd
- SW02_Forecasting_Multiple_Models.Rmd
--Extending
-broom
to time series forecasting
One of the most powerful benefits of sweep
is that it
-helps forecasting at scale within the “tidyverse”. There are two common
-situations:
In this vignette we’ll review how sweep
can help the
-second situation: Applying multiple models to a
-time series.
To start, let’s get some data from the FRED data base using
-tidyquant
. We’ll use tq_get()
to retrieve the
-Gasoline Prices from 1990 through today (2023-12-08).
-gas_prices_monthly_raw <- tq_get(
- x = "GASREGCOVM",
- get = "economic.data",
- from = "1990-01-01",
- to = "2016-12-31")
-gas_prices_monthly_raw
## # A tibble: 316 × 3
-## symbol date price
-## <chr> <date> <dbl>
-## 1 GASREGCOVM 1990-09-01 1.26
-## 2 GASREGCOVM 1990-10-01 1.34
-## 3 GASREGCOVM 1990-11-01 1.32
-## 4 GASREGCOVM 1990-12-01 NA
-## 5 GASREGCOVM 1991-01-01 NA
-## 6 GASREGCOVM 1991-02-01 1.09
-## 7 GASREGCOVM 1991-03-01 1.04
-## 8 GASREGCOVM 1991-04-01 1.08
-## 9 GASREGCOVM 1991-05-01 1.13
-## 10 GASREGCOVM 1991-06-01 1.13
-## # ℹ 306 more rows
-Upon a brief inspection, the data contains 2 NA
values
-that will need to be dealt with.
-summary(gas_prices_monthly_raw$price)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-## 0.900 1.138 1.615 1.974 2.697 4.002 2
-We can use the fill()
from the tidyr
-package to help deal with these data. We first fill down and then fill
-up to use the previous and then post days prices to fill in the missing
-data.
-gas_prices_monthly <- gas_prices_monthly_raw %>%
- fill(price, .direction = "down") %>%
- fill(price, .direction = "up")
We can now visualize the data.
-
-gas_prices_monthly %>%
- ggplot(aes(x = date, y = price)) +
- geom_line(color = palette_light()[[1]]) +
- labs(title = "Gasoline Prices, Monthly", x = "", y = "USD") +
- scale_y_continuous(labels = scales::dollar) +
- theme_tq()
Monthly periodicity might be a bit granular for model fitting. We can
-easily switch periodicity to quarterly using tq_transmute()
-from the tidyquant
package along with the periodicity
-aggregation function to.period
from the xts
-package. We’ll convert the date to yearqtr
class which is
-regularized.
-gas_prices_quarterly <- gas_prices_monthly %>%
- tq_transmute(mutate_fun = to.period, period = "quarters")
-gas_prices_quarterly
## # A tibble: 106 × 2
-## date price
-## <date> <dbl>
-## 1 1990-09-01 1.26
-## 2 1990-12-01 1.32
-## 3 1991-03-01 1.04
-## 4 1991-06-01 1.13
-## 5 1991-09-01 1.11
-## 6 1991-12-01 1.08
-## 7 1992-03-01 1.01
-## 8 1992-06-01 1.14
-## 9 1992-09-01 1.12
-## 10 1992-12-01 1.08
-## # ℹ 96 more rows
-Another quick visualization to show the reduction in granularity.
-
-gas_prices_quarterly %>%
- ggplot(aes(x = date, y = price)) +
- geom_line(color = palette_light()[[1]], linewidth = 1) +
- labs(title = "Gasoline Prices, Quarterly", x = "", y = "USD") +
- scale_y_continuous(labels = scales::label_dollar()) +
- scale_x_date(date_breaks = "5 years", date_labels = "%Y") +
- theme_tq()
In this section we will use three models to forecast gasoline -prices:
-Before we jump into modeling, let’s take a look at the multiple model -process from R for -Data Science, Chapter 25 Many Models. We first create a data frame -from a named list. The example below has two columns: “f” the functions -as text, and “params” a nested list of parameters we will pass to the -respective function in column “f”.
-
-df <- tibble(
- f = c("runif", "rpois", "rnorm"),
- params = list(
- list(n = 10),
- list(n = 5, lambda = 10),
- list(n = 10, mean = -3, sd = 10)
- )
-)
-df
## # A tibble: 3 × 2
-## f params
-## <chr> <list>
-## 1 runif <named list [1]>
-## 2 rpois <named list [2]>
-## 3 rnorm <named list [3]>
-We can also view the contents of the df$params
column to
-understand the underlying structure. Notice that there are three primary
-levels and then secondary levels containing the name-value pairs of
-parameters. This format is important.
-df$params
## [[1]]
-## [[1]]$n
-## [1] 10
-##
-##
-## [[2]]
-## [[2]]$n
-## [1] 5
-##
-## [[2]]$lambda
-## [1] 10
-##
-##
-## [[3]]
-## [[3]]$n
-## [1] 10
-##
-## [[3]]$mean
-## [1] -3
-##
-## [[3]]$sd
-## [1] 10
-Next we apply the functions to the parameters using a special
-function, invoke_map()
. The parameter lists in the “params”
-column are passed to the function in the “f” column. The output is in a
-nested list-column named “out”.
-# FIXME invoke_map is deprecated
-df_out <- df %>%
- mutate(out = invoke_map(f, params))
## Warning: There was 1 warning in `mutate()`.
-## ℹ In argument: `out = invoke_map(f, params)`.
-## Caused by warning:
-## ! `invoke_map()` was deprecated in purrr 1.0.0.
-## ℹ Please use map() + exec() instead.
-
-df_out
## # A tibble: 3 × 3
-## f params out
-## <chr> <list> <list>
-## 1 runif <named list [1]> <dbl [10]>
-## 2 rpois <named list [2]> <int [5]>
-## 3 rnorm <named list [3]> <dbl [10]>
-And, here’s the contents of “out”, which is the result of mapping a -list of functions to a list of parameters. Pretty powerful!
-
-df_out$out
## [[1]]
-## [1] 0.12529973 0.21685173 0.08249597 0.76834989 0.63376226 0.30774705
-## [7] 0.21109112 0.32589788 0.08359866 0.23515630
-##
-## [[2]]
-## [1] 10 11 11 12 6
-##
-## [[3]]
-## [1] -11.06074584 -3.36132264 1.37013110 -9.02433907 -0.47015515
-## [6] 0.04641386 -14.17989613 1.76384292 -11.52289943 -0.63068203
-Take a minute to understand the conceptual process of the
-invoke_map
function and specifically the parameter setup.
-Once you are comfortable, we can move on to model implementation.
We’ll need to take the following steps to in an actual forecast model -implementation:
-This is easier than it sounds. Let’s start by coercing the univariate
-time series with tk_ts()
.
-gas_prices_quarterly_ts <- gas_prices_quarterly %>%
- tk_ts(select = -date, start = c(1990, 3), freq = 4)
-gas_prices_quarterly_ts
## Qtr1 Qtr2 Qtr3 Qtr4
-## 1990 1.258 1.324
-## 1991 1.040 1.128 1.109 1.076
-## 1992 1.013 1.145 1.122 1.078
-## 1993 1.052 1.097 1.050 1.014
-## 1994 1.008 1.078 1.144 1.060
-## 1995 1.059 1.186 1.108 1.066
-## 1996 1.129 1.243 1.200 1.233
-## 1997 1.197 1.189 1.216 1.119
-## 1998 1.014 1.048 0.994 0.923
-## 1999 0.961 1.095 1.239 1.261
-## 2000 1.498 1.612 1.525 1.418
-## 2001 1.384 1.548 1.506 1.072
-## 2002 1.221 1.341 1.363 1.348
-## 2003 1.636 1.452 1.616 1.448
-## 2004 1.689 1.910 1.841 1.800
-## 2005 2.063 2.123 2.862 2.174
-## 2006 2.413 2.808 2.501 2.284
-## 2007 2.503 3.024 2.817 2.984
-## 2008 3.215 3.989 3.709 1.669
-## 2009 1.937 2.597 2.480 2.568
-## 2010 2.742 2.684 2.678 2.951
-## 2011 3.509 3.628 3.573 3.220
-## 2012 3.774 3.465 3.801 3.256
-## 2013 3.648 3.576 3.474 3.209
-## 2014 3.474 3.626 3.354 2.488
-## 2015 2.352 2.700 2.275 1.946
-## 2016 1.895 2.303 2.161 2.192
-Next, create a nested list using the function names as the -first-level keys (this is important as you’ll see in the next step). -Pass the model parameters as name-value pairs in the second level.
-
-models_list <- list(
- auto.arima = list(
- y = gas_prices_quarterly_ts
- ),
- ets = list(
- y = gas_prices_quarterly_ts,
- damped = TRUE
- ),
- bats = list(
- y = gas_prices_quarterly_ts
- )
-)
Now, convert to a data frame using the function,
-enframe()
that turns lists into tibbles. Set the arguments
-name = "f"
and value = "params"
. In doing so
-we get a bonus: the model names are the now conveniently located in
-column “f”.
-models_tbl <- tibble::enframe(models_list, name = "f", value = "params")
-models_tbl
## # A tibble: 3 × 2
-## f params
-## <chr> <list>
-## 1 auto.arima <named list [1]>
-## 2 ets <named list [2]>
-## 3 bats <named list [1]>
-We are ready to invoke the map. Combine mutate()
with
-invoke_map()
as follows. Bada bing, bada boom, we now have
-models fitted using the parameters we defined previously.
-models_tbl_fit <- models_tbl %>%
- mutate(fit = purrr::invoke_map(f, params))
-models_tbl_fit
## # A tibble: 3 × 3
-## f params fit
-## <chr> <list> <list>
-## 1 auto.arima <named list [1]> <fr_ARIMA>
-## 2 ets <named list [2]> <ets>
-## 3 bats <named list [1]> <bats>
-It’s a good point to review and understand the model output. We can
-review the model parameters, accuracy measurements, and the residuals
-using sw_tidy()
, sw_glance()
, and
-sw_augment()
.
The tidying function returns the model parameters and estimates. We
-use the combination of mutate
and map
to
-iteratively apply the sw_tidy()
function as a new column
-named “tidy”. Then we unnest and spread to review the terms by model
-function.
-models_tbl_fit %>%
- mutate(tidy = map(fit, sw_tidy)) %>%
- unnest(tidy) %>%
- spread(key = f, value = estimate)
## # A tibble: 20 × 6
-## params fit term auto.arima bats ets
-## <list> <list> <chr> <dbl> <dbl> <dbl>
-## 1 <named list [1]> <fr_ARIMA> ar1 0.834 NA NA
-## 2 <named list [1]> <fr_ARIMA> ma1 -0.964 NA NA
-## 3 <named list [1]> <fr_ARIMA> sar1 0.939 NA NA
-## 4 <named list [1]> <fr_ARIMA> sma1 -0.776 NA NA
-## 5 <named list [1]> <bats> alpha NA 0.588 NA
-## 6 <named list [1]> <bats> ar.coefficients NA NA NA
-## 7 <named list [1]> <bats> beta NA NA NA
-## 8 <named list [1]> <bats> damping.parameter NA NA NA
-## 9 <named list [1]> <bats> gamma.values NA -0.0262 NA
-## 10 <named list [1]> <bats> lambda NA 0.0000605 NA
-## 11 <named list [1]> <bats> ma.coefficients NA 0.256 NA
-## 12 <named list [2]> <ets> alpha NA NA 0.831
-## 13 <named list [2]> <ets> b NA NA -0.0524
-## 14 <named list [2]> <ets> beta NA NA 0.000100
-## 15 <named list [2]> <ets> gamma NA NA 0.0521
-## 16 <named list [2]> <ets> l NA NA 1.29
-## 17 <named list [2]> <ets> phi NA NA 0.837
-## 18 <named list [2]> <ets> s0 NA NA 0.0469
-## 19 <named list [2]> <ets> s1 NA NA -0.0209
-## 20 <named list [2]> <ets> s2 NA NA -0.0407
-Glance is one of the most powerful tools because it yields the model -accuracies enabling direct comparisons between the fit of each model. We -use the same process for used for tidy, except theres no need to spread -to perform the comparison. We can see that the ARIMA model has the -lowest AIC by far.
- -## Warning: The `.drop` argument of `unnest()` is deprecated as of tidyr 1.0.0.
-## ℹ All list-columns are now preserved.
-## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
-## generated.
-## # A tibble: 3 × 15
-## f params fit model.desc sigma logLik AIC BIC ME RMSE
-## <chr> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-## 1 auto… <named list> <fr_ARIMA> ARIMA(1,1… 0.298 -20.6 51.2 64.4 0.0180 0.291
-## 2 ets <named list> <ets> ETS(M,Ad,… 0.118 -76.6 173. 200. 0.0149 0.292
-## 3 bats <named list> <bats> BATS(0, {… 0.116 159. 179. 184. 0.0193 0.259
-## # ℹ 5 more variables: MAE <dbl>, MPE <dbl>, MAPE <dbl>, MASE <dbl>, ACF1 <dbl>
-We can augment the models to get the residuals following the same
-procedure. We can pipe (%>%
) the results right into
-ggplot()
for plotting. Notice the ARIMA model has the
-largest residuals especially as the model index increases whereas the
-bats model has relatively low residuals.
-models_tbl_fit %>%
- mutate(augment = map(fit, sw_augment, rename_index = "date")) %>%
- unnest(augment) %>%
- ggplot(aes(x = date, y = .resid, group = f)) +
- geom_line(color = palette_light()[[2]]) +
- geom_point(color = palette_light()[[1]]) +
- geom_smooth(method = "loess") +
- facet_wrap(~ f, nrow = 3) +
- labs(title = "Residuals Plot") +
- theme_tq()
## `geom_smooth()` using formula = 'y ~ x'
-
-Creating the forecast for the models is accomplished by mapping the
-forecast
function. The next six quarters are forecasted
-withe the argument h = 6
.
-models_tbl_fcast <- models_tbl_fit %>%
- mutate(fcast = map(fit, forecast, h = 6))
-models_tbl_fcast
## # A tibble: 3 × 4
-## f params fit fcast
-## <chr> <list> <list> <list>
-## 1 auto.arima <named list [1]> <fr_ARIMA> <forecast>
-## 2 ets <named list [2]> <ets> <forecast>
-## 3 bats <named list [1]> <bats> <forecast>
-Next, we map sw_sweep
, which coerces the forecast into
-the “tidy” tibble format. We set fitted = FALSE
to remove
-the model fitted values from the output. We set
-timetk_idx = TRUE
to use dates instead of numeric values
-for the index.
-models_tbl_fcast_tidy <- models_tbl_fcast %>%
- mutate(sweep = map(fcast, sw_sweep, fitted = FALSE, timetk_idx = TRUE, rename_index = "date"))
## Warning: There were 3 warnings in `mutate()`.
-## The first warning was:
-## ℹ In argument: `sweep = map(fcast, sw_sweep, fitted = FALSE, timetk_idx = TRUE,
-## rename_index = "date")`.
-## Caused by warning in `.check_tzones()`:
-## ! 'tzone' attributes are inconsistent
-## ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings.
-
-models_tbl_fcast_tidy
## # A tibble: 3 × 5
-## f params fit fcast sweep
-## <chr> <list> <list> <list> <list>
-## 1 auto.arima <named list [1]> <fr_ARIMA> <forecast> <tibble [112 × 7]>
-## 2 ets <named list [2]> <ets> <forecast> <tibble [112 × 7]>
-## 3 bats <named list [1]> <bats> <forecast> <tibble [112 × 7]>
-We can unnest the “sweep” column to get the results of all three -models.
- -## # A tibble: 336 × 11
-## f params fit fcast date key price lo.80 lo.95
-## <chr> <list> <list> <list> <date> <chr> <dbl> <dbl> <dbl>
-## 1 auto.a… <named list> <fr_ARIMA> <forecast> 1990-09-01 actu… 1.26 NA NA
-## 2 auto.a… <named list> <fr_ARIMA> <forecast> 1990-12-01 actu… 1.32 NA NA
-## 3 auto.a… <named list> <fr_ARIMA> <forecast> 1991-03-01 actu… 1.04 NA NA
-## 4 auto.a… <named list> <fr_ARIMA> <forecast> 1991-06-01 actu… 1.13 NA NA
-## 5 auto.a… <named list> <fr_ARIMA> <forecast> 1991-09-01 actu… 1.11 NA NA
-## 6 auto.a… <named list> <fr_ARIMA> <forecast> 1991-12-01 actu… 1.08 NA NA
-## 7 auto.a… <named list> <fr_ARIMA> <forecast> 1992-03-01 actu… 1.01 NA NA
-## 8 auto.a… <named list> <fr_ARIMA> <forecast> 1992-06-01 actu… 1.14 NA NA
-## 9 auto.a… <named list> <fr_ARIMA> <forecast> 1992-09-01 actu… 1.12 NA NA
-## 10 auto.a… <named list> <fr_ARIMA> <forecast> 1992-12-01 actu… 1.08 NA NA
-## # ℹ 326 more rows
-## # ℹ 2 more variables: hi.80 <dbl>, hi.95 <dbl>
-Finally, we can plot the forecasts by unnesting the “sweep” column
-and piping to ggplot()
.
-models_tbl_fcast_tidy %>%
- unnest(sweep) %>%
- ggplot(aes(x = date, y = price, color = key, group = f)) +
- geom_ribbon(aes(ymin = lo.95, ymax = hi.95),
- fill = "#D5DBFF", color = NA, linewidth = 0) +
- geom_ribbon(aes(ymin = lo.80, ymax = hi.80, fill = key),
- fill = "#596DD5", color = NA, linewidth = 0, alpha = 0.8) +
- geom_line(linewidth = 1) +
- facet_wrap(~f, nrow = 3) +
- labs(title = "Gasoline Price Forecasts",
- subtitle = "Forecasting multiple models with sweep: ARIMA, BATS, ETS",
- x = "", y = "Price") +
- scale_y_continuous(labels = scales::label_dollar()) +
- scale_x_date(date_breaks = "5 years", date_labels = "%Y") +
- theme_tq() +
- scale_color_tq()
-- - -Extending
-broom
to time series forecasting
The sweep
package extends the broom
tools (tidy, glance, and augment) for performing forecasts and time series analysis in the “tidyverse”. The package is geared towards “tidying” the forecast workflow used with Rob Hyndman’s forecast
package.
tidyverse
tools in R for Data Sciencebroom
for model analysis (ARIMA, ETS, BATS, etc)forecast
objects for easy plotting and “tidy” data manipulationtimetk
to enable dates and datetimes (irregular time series) in the tidied forecast outputThe package contains the following elements:
-model tidiers: sw_tidy
, sw_glance
, sw_augment
, sw_tidy_decomp
functions extend tidy
, glance
, and augment
from the broom
package specifically for models (ets()
, Arima()
, bats()
, etc) used for forecasting.
forecast tidier: sw_sweep
converts a forecast
object to a tibble that can be easily manipulated in the “tidyverse”.
sweep
enables converting a forecast
object to tibble
. The result is ability to use dplyr
, tidyr
, and ggplot
natively to manipulate, analyze and visualize forecasts.
Often forecasts are required on grouped data to analyse trends in sub-categories. The good news is scaling from one time series to many is easy with the various sw_
functions in combination with dplyr
and purrr
.
A common goal in forecasting is to compare different forecast models against each other. sweep
helps in this area as well.
If you are familiar with broom
, you know how useful it is for retrieving “tidy” format model components. sweep
extends this benefit to the forecast
package workflow with the following functions:
sw_tidy
: Returns model coefficients (single column)sw_glance
: Returns accuracy statistics (single row)sw_augment
: Returns residualssw_tidy_decomp
: Returns seasonal decompositionssw_sweep
: Returns tidy forecast outputs.The compatibility chart is listed below.
-Object | -sw_tidy() | -sw_glance() | -sw_augment() | -sw_tidy_decomp() | -sw_sweep() | -
---|---|---|---|---|---|
ar | -- | - | - | - | - |
arima | -X | -X | -X | -- | - |
Arima | -X | -X | -X | -- | - |
ets | -X | -X | -X | -X | -- |
baggedETS | -- | - | - | - | - |
bats | -X | -X | -X | -X | -- |
tbats | -X | -X | -X | -X | -- |
nnetar | -X | -X | -X | -- | - |
stl | -- | - | - | X | -- |
HoltWinters | -X | -X | -X | -X | -- |
StructTS | -X | -X | -X | -X | -- |
tslm | -X | -X | -X | -- | - |
decompose | -- | - | - | X | -- |
adf.test | -X | -X | -- | - | - |
Box.test | -X | -X | -- | - | - |
kpss.test | -X | -X | -- | - | - |
forecast | -- | - | - | - | X | -
Function Compatibility
-Here’s how to get started.
-Development version with latest features:
-
-# install.packages("remotes")
-remotes::install_github("business-science/sweep")
NEWS.md
- Remove internal usage of dplyr::select_()
. (@olivroy, #22)
CRAN release: 2023-07-06
-sweep
back on CRAN following inadvertent timetk
archival.CRAN release: 2020-07-10
-broom
v0.7.0stlm()
modelsCRAN release: 2017-07-26
-timetk
from timekit
.sw_tidy
fails when auto.arima()
returns no terms (coefficients).Adds a sequential index column to a data frame
-Refer to forecast:::arima.string.
-forecast
arima.R
Refer to forecast:::makeText.
-forecast
bats.R
R/sales_data.R
- bike_sales.Rd
A dataset containing the fictional bicycle orders spanning 2011 through 2015.
-Hypothetically, the bike_sales
data are similar to sales data mainatained
-in a business' sales data base. The unit price and model names come from
-data provided by model for the bicycle manufacturer, Cannondale (2016).
-The customers (bicycle shops) including name, location, etc and
-the orders including quantity purchased and order dates are fictional.
-The data is intended for implementing business analytics techniques
-(e.g. forecast, clustering, etc) to identify underlying trends.
A data frame with 15644 rows and 17 variables:
Date the order was placed
A unique order identification number
The sequential identification number for products on and order
Number of units purchased
The unit price of the bicycle
The extended price = price x quantity
A unique customer identification number
The customer name
The city that the bike shop is located
The state that the bike shop is located
The geograhpic latitude of the customer location
The geograhpic longitude of the customer location
A unique product identification number
The model name of the bicycle
The main bicycle category, either "Mountain" or "Road"
One of nine more specific bicycle categories
The bicycle frame material, either "Carbon" or "Aluminum"
The 2016 bicycle model names and prices originated from https://www.cannondale.com/en-us
-add_index()
- arima_string()
- bats_string()
- bike_sales
- sw_augment()
- sw_augment(<default>)
- sw_augment_columns()
- sw_glance()
- sw_glance(<default>)
- sw_sweep()
- sw_tidy()
- sw_tidy(<default>)
- sw_tidy_decomp()
- tbats_string()
- sw_tidy(<HoltWinters>)
sw_glance(<HoltWinters>)
sw_augment(<HoltWinters>)
sw_tidy_decomp(<HoltWinters>)
- sw_tidy(<StructTS>)
sw_glance(<StructTS>)
sw_augment(<StructTS>)
- sw_tidy(<Arima>)
sw_glance(<Arima>)
sw_augment(<Arima>)
sw_tidy(<stlm>)
- sw_tidy(<bats>)
sw_glance(<bats>)
sw_augment(<bats>)
sw_tidy_decomp(<bats>)
- sw_tidy_decomp(<decomposed.ts>)
- sw_tidy(<ets>)
sw_glance(<ets>)
sw_augment(<ets>)
sw_tidy_decomp(<ets>)
- sw_tidy(<nnetar>)
sw_glance(<nnetar>)
sw_augment(<nnetar>)
- sw_tidy(<stl>)
sw_tidy_decomp(<stl>)
sw_tidy_decomp(<stlm>)
sw_glance(<stlm>)
sw_augment(<stlm>)
- validate_index()
- These objects are imported from other packages. Follow the links -below to see their documentation.
-By default, sw_augment()
uses broom::augment()
to convert its output.
# S3 method for default
-sw_augment(x, ...)
an object to be tidied
extra arguments passed to broom::augment()
A tibble generated by broom::augment()
Given an R statistical model or other non-tidy object, add columns to the -original dataset such as predictions, residuals and cluster assignments.
-model or other R object to convert to data frame
other arguments passed to methods
sw_augment()
is a wrapper for broom::augment()
. The benefit of sw_augment
-is that it has methods for various time-series model classes such as
-HoltWinters
, ets
, Arima
, etc.
For non-time series, sw_augment()
defaults to broom::augment()
.
-The only difference is that the return is a tibble.
Note that by convention the first argument is almost always data
,
-which specifies the original data object. This is not part of the S3
-signature, partly because it prevents rowwise_df_tidiers from
-taking a column name as the first argument.
Augments data
-By default, sw_glance()
uses broom::glance()
to convert its output.
# S3 method for default
-sw_glance(x, ...)
an object to be tidied
extra arguments passed to broom::glance()
A tibble generated by broom::glance()
R/sw_glance.R
- sw_glance.Rd
Construct a single row summary "glance" of a model, fit, or other -object
-model or other R object to convert to single-row data frame
other arguments passed to methods
sw_glance()
is a wrapper for broom::glance()
. The benefit of sw_glance
-is that it has methods for various time-series model classes such as
-HoltWinters
, ets
, Arima
, etc.
-sw_glance
methods always return either a one-row tibble or NULL
.
-The single row includes summary statistics relevent to the model accuracy,
-which can be used to assess model fit and quality.
For non-time series, sw_glance()
defaults to broom::glance()
.
-The only difference is that the return is a tibble.
Tidy forecast objects
-A time-series forecast of class forecast
.
Whether or not to return the fitted values (model values) in the results. -FALSE by default.
If timetk index (non-regularized index) is present, uses it -to develop forecast. Otherwise uses default index.
Enables the index column to be renamed.
Additional arguments passed to tk_make_future_timeseries()
sw_sweep
is designed
-to coerce forecast
objects from the forecast
package
-into tibble
objects in a "tidy" format (long).
-The returned object contains both the actual values
-and the forecasted values including the point forecast and upper and lower
-confidence intervals.
The timetk_idx
argument is used to modify the return format of the index.
If timetk_idx = FALSE
, a regularized time index is always constructed.
-This may be in the format of numeric values (e.g. 2010.000) or the
-higher order yearmon
and yearqtr
classes from the zoo
package.
-A higher order class is attempted to be returned.
If timetk_idx = TRUE
and a timetk index is present, an irregular time index
-will be returned that combines the original time series (i.e. date or datetime)
-along with a computed future time series created using tk_make_future_timeseries()
-from the timetk
package. The ...
can be used to pass additional arguments
-to tk_make_future_timeseries()
such as inspect_weekdays
, skip_values
, etc
-that can be useful in tuning the future time series sequence.
The index column name can be changed using the rename_index
argument.
library(forecast)
-#> Registered S3 method overwritten by 'quantmod':
-#> method from
-#> as.zoo.data.frame zoo
-library(dplyr)
-#>
-#> Attaching package: ‘dplyr’
-#> The following objects are masked from ‘package:stats’:
-#>
-#> filter, lag
-#> The following objects are masked from ‘package:base’:
-#>
-#> intersect, setdiff, setequal, union
-
-# ETS forecasts
-USAccDeaths %>%
- ets() %>%
- forecast(level = c(80, 95, 99)) %>%
- sw_sweep()
-#> # A tibble: 96 × 9
-#> index key value lo.80 lo.95 lo.99 hi.80 hi.95 hi.99
-#> <yearmon> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 Jan 1973 actual 9007 NA NA NA NA NA NA
-#> 2 Feb 1973 actual 8106 NA NA NA NA NA NA
-#> 3 Mar 1973 actual 8928 NA NA NA NA NA NA
-#> 4 Apr 1973 actual 9137 NA NA NA NA NA NA
-#> 5 May 1973 actual 10017 NA NA NA NA NA NA
-#> 6 Jun 1973 actual 10826 NA NA NA NA NA NA
-#> 7 Jul 1973 actual 11317 NA NA NA NA NA NA
-#> 8 Aug 1973 actual 10744 NA NA NA NA NA NA
-#> 9 Sep 1973 actual 9713 NA NA NA NA NA NA
-#> 10 Oct 1973 actual 9938 NA NA NA NA NA NA
-#> # ℹ 86 more rows
-
-
-
By default, sw_tidy()
uses broom::tidy()
to convert its output.
# S3 method for default
-sw_tidy(x, ...)
an object to be tidied
extra arguments passed to broom::tidy()
A tibble generated by broom::tidy()
Tidy the result of a time-series model into a summary tibble
-An object to be converted into a tibble ("tidy" data.frame)
extra arguments
sw_tidy()
is a wrapper for broom::tidy()
. The main benefit of sw_tidy()
-is that it has methods for various time-series model classes such as
-HoltWinters
, ets
, Arima
, etc.
-sw_tidy()
methods always returns a "tidy" tibble with model coefficient / parameters.
For non-time series, sw_tidy()
defaults to broom::tidy()
.
-The only difference is that the return is a tibble.
-The output of sw_tidy()
is always a tibble with disposable row names. It is
-therefore suited for further manipulation by packages like dplyr and
-ggplot2.
R/sw_tidy_decomp.R
- sw_tidy_decomp.Rd
Coerces decomposed time-series objects to tibble format.
-A time-series object of class stl
, ets
, decomposed.ts
, HoltWinters
,
-bats
or tbats
.
When TRUE
, uses a timetk index (irregular, typically date or datetime) if present.
Enables the index column to be renamed.
Not used.
sw_tidy_decomp
is designed
-to coerce time-series objects with decompositions to tibble
objects.
A regularized time index is always constructed. If no time index is
-detected, a sequential index is returned as a default.
-The index column name can be changed using the rename_index
argument.
library(dplyr)
-library(forecast)
-library(sweep)
-
-# Decompose ETS model
-USAccDeaths %>%
- ets() %>%
- sw_tidy_decomp()
-#> # A tibble: 73 × 4
-#> index observed level season
-#> <yearmon> <dbl> <dbl> <dbl>
-#> 1 Dec 1972 NA 9248. -51.3
-#> 2 Jan 1973 9007 9544. -738.
-#> 3 Feb 1973 8106 9603. -1538.
-#> 4 Mar 1973 8928 9642. -740.
-#> 5 Apr 1973 9137 9633. -490.
-#> 6 May 1973 10017 9679. 307.
-#> 7 Jun 1973 10826 9911. 757.
-#> 8 Jul 1973 11317 9746. 1683.
-#> 9 Aug 1973 10744 9762. 971.
-#> 10 Sep 1973 9713 9805. -122.
-#> # ℹ 63 more rows
-
-# Decompose STL object
-USAccDeaths %>%
- stl(s.window = 'periodic') %>%
- sw_tidy_decomp()
-#> # A tibble: 72 × 6
-#> index observed season trend remainder seasadj
-#> <yearmon> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 Jan 1973 9007 -820. 9935. -108. 9827.
-#> 2 Feb 1973 8106 -1559. 9881. -216. 9665.
-#> 3 Mar 1973 8928 -760. 9827. -139. 9688.
-#> 4 Apr 1973 9137 -530. 9766. -98.2 9667.
-#> 5 May 1973 10017 335. 9704. -22.0 9682.
-#> 6 Jun 1973 10826 815. 9637. 374. 10011.
-#> 7 Jul 1973 11317 1682. 9569. 65.9 9635.
-#> 8 Aug 1973 10744 982. 9500. 262. 9762.
-#> 9 Sep 1973 9713 -62.8 9431. 345. 9776.
-#> 10 Oct 1973 9938 232. 9343. 363. 9706.
-#> # ℹ 62 more rows
-
-
-
R/sweep-package.R
- sweep_package.Rd
The sweep
package "tidies" up the
-modeling workflow of the forecast
package.
The model and forecast objects are not covered by
-the broom
package. It includes the sw_tidy()
, sw_glance()
,
-and sw_augment()
functions that work in a similar capacity as broom
functions.
-In addition, it provides sw_tidy_decomp()
to tidy decompositions, and
-sw_sweep()
to coerce forecast
objects to "tibbles" for easy visualization with ggplot2
-and manipulation with dplyr
.
To learn more about sweep
, start with the vignettes:
-browseVignettes(package = "sweep")
Refer to forecast:::makeTextTBATS.
-forecast
bats.R
R/tidiers_HoltWinters.R
- tidiers_HoltWinters.Rd
These methods tidy HoltWinters
models of univariate time
-series.
# S3 method for HoltWinters
-sw_tidy(x, ...)
-
-# S3 method for HoltWinters
-sw_glance(x, ...)
-
-# S3 method for HoltWinters
-sw_augment(x, data = NULL, rename_index = "index", timetk_idx = FALSE, ...)
-
-# S3 method for HoltWinters
-sw_tidy_decomp(x, timetk_idx = FALSE, rename_index = "index", ...)
An object of class "HoltWinters"
Additional parameters (not used)
Used with sw_augment
only.
-NULL
by default which simply returns augmented columns only.
-User can supply the original data, which returns the data + augmented columns.
Used with sw_augment
only.
-A string representing the name of the index generated.
Used with sw_augment
and sw_tidy_decomp
.
-When TRUE
, uses a timetk index (irregular, typically date or datetime) if present.
sw_tidy()
returns one row for each model parameter,
-with two columns:
term
: The various parameters (alpha, beta, gamma, and coefficients)
estimate
: The estimated parameter value
sw_glance()
returns one row with the following columns:
model.desc
: A description of the model
sigma
: The square root of the estimated residual variance
logLik
: The data's log-likelihood under the model
AIC
: The Akaike Information Criterion
BIC
: The Bayesian Information Criterion (NA
for bats / tbats)
ME
: Mean error
RMSE
: Root mean squared error
MAE
: Mean absolute error
MPE
: Mean percentage error
MAPE
: Mean absolute percentage error
MASE
: Mean absolute scaled error
ACF1
: Autocorrelation of errors at lag 1
sw_augment()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
.actual
: The original time series
.fitted
: The fitted values from the model
.resid
: The residual values from the model
sw_tidy_decomp()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
observed
: The original time series
season
: The seasonal component
trend
: The trend component
remainder
: observed - (season + trend)
seasadj
: observed - season (or trend + remainder)
library(dplyr)
-library(forecast)
-
-fit_hw <- USAccDeaths %>%
- stats::HoltWinters()
-
-sw_tidy(fit_hw)
-#> # A tibble: 17 × 2
-#> term estimate
-#> <chr> <dbl>
-#> 1 alpha 0.738
-#> 2 beta 0.0223
-#> 3 gamma 1
-#> 4 a 8799.
-#> 5 b -22.7
-#> 6 s1 -802.
-#> 7 s2 -1740.
-#> 8 s3 -960.
-#> 9 s4 -594.
-#> 10 s5 259.
-#> 11 s6 674.
-#> 12 s7 1771.
-#> 13 s8 1031.
-#> 14 s9 211.
-#> 15 s10 549.
-#> 16 s11 128.
-#> 17 s12 441.
-sw_glance(fit_hw)
-#> # A tibble: 1 × 12
-#> model.desc sigma logLik AIC BIC ME RMSE MAE MPE MAPE MASE ACF1
-#> <chr> <dbl> <lgl> <lgl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 HoltWinte… 2939. NA NA NA 61.4 379. 274. 0.727 3.21 0.626 0.0569
-sw_augment(fit_hw)
-#> # A tibble: 72 × 4
-#> index .actual .fitted .resid
-#> <yearmon> <dbl> <dbl> <dbl>
-#> 1 Jan 1973 9007 NA NA
-#> 2 Feb 1973 8106 NA NA
-#> 3 Mar 1973 8928 NA NA
-#> 4 Apr 1973 9137 NA NA
-#> 5 May 1973 10017 NA NA
-#> 6 Jun 1973 10826 NA NA
-#> 7 Jul 1973 11317 NA NA
-#> 8 Aug 1973 10744 NA NA
-#> 9 Sep 1973 9713 NA NA
-#> 10 Oct 1973 9938 NA NA
-#> # ℹ 62 more rows
-sw_tidy_decomp(fit_hw)
-#> # A tibble: 72 × 6
-#> index observed season trend remainder seasadj
-#> <yearmon> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 Jan 1973 9007 NA NA NA NA
-#> 2 Feb 1973 8106 NA NA NA NA
-#> 3 Mar 1973 8928 NA NA NA NA
-#> 4 Apr 1973 9137 NA NA NA NA
-#> 5 May 1973 10017 NA NA NA NA
-#> 6 Jun 1973 10826 NA NA NA NA
-#> 7 Jul 1973 11317 NA NA NA NA
-#> 8 Aug 1973 10744 NA NA NA NA
-#> 9 Sep 1973 9713 NA NA NA NA
-#> 10 Oct 1973 9938 NA NA NA NA
-#> # ℹ 62 more rows
-
-
R/tidiers_StructTS.R
- tidiers_StructTS.Rd
These methods tidy the coefficients of StructTS models of univariate time -series.
-# S3 method for StructTS
-sw_tidy(x, ...)
-
-# S3 method for StructTS
-sw_glance(x, ...)
-
-# S3 method for StructTS
-sw_augment(x, data = NULL, timetk_idx = FALSE, rename_index = "index", ...)
An object of class "StructTS"
Additional parameters (not used)
Used with sw_augment
only.
-NULL
by default which simply returns augmented columns only.
-User can supply the original data, which returns the data + augmented columns.
Used with sw_augment
only.
-Uses a irregular timetk index if present.
Used with sw_augment
only.
-A string representing the name of the index generated.
sw_tidy()
returns one row for each model parameter,
-with two columns:
term
: The model parameters
estimate
: The estimated parameter value
sw_glance()
returns one row with the columns
model.desc
: A description of the model including the
-three integer components (p, d, q) are the AR order,
-the degree of differencing, and the MA order.
sigma
: The square root of the estimated residual variance
logLik
: The data's log-likelihood under the model
AIC
: The Akaike Information Criterion
BIC
: The Bayesian Information Criterion
ME
: Mean error
RMSE
: Root mean squared error
MAE
: Mean absolute error
MPE
: Mean percentage error
MAPE
: Mean absolute percentage error
MASE
: Mean absolute scaled error
ACF1
: Autocorrelation of errors at lag 1
sw_augment()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
.actual
: The original time series
.fitted
: The fitted values from the model
.resid
: The residual values from the model
library(dplyr)
-library(forecast)
-
-fit_StructTS <- WWWusage %>%
- StructTS()
-
-sw_tidy(fit_StructTS)
-#> # A tibble: 3 × 2
-#> term estimate
-#> <chr> <dbl>
-#> 1 level 0
-#> 2 slope 13.0
-#> 3 epsilon 0
-sw_glance(fit_StructTS)
-#> # A tibble: 1 × 12
-#> model.desc sigma logLik AIC BIC ME RMSE MAE MPE MAPE MASE
-#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 Local linear s… 0.995 -277. 559. 564. -0.0200 3.59 2.96 0.140 2.32 0.654
-#> # ℹ 1 more variable: ACF1 <dbl>
-sw_augment(fit_StructTS)
-#> # A tibble: 100 × 4
-#> index .actual .fitted .resid
-#> <int> <dbl> <dbl> <dbl>
-#> 1 1 88 88 0
-#> 2 2 84 88.0 -4.00
-#> 3 3 85 80 5
-#> 4 4 85 86 -1
-#> 5 5 84 85 -1
-#> 6 6 85 83 2
-#> 7 7 83 86 -3
-#> 8 8 85 81 4
-#> 9 9 88 87 1
-#> 10 10 89 91 -2
-#> # ℹ 90 more rows
-
-
R/tidiers_arima.R
, R/tidiers_stl.R
- tidiers_arima.Rd
These methods tidy the coefficients of ARIMA models of univariate time -series.
-# S3 method for Arima
-sw_tidy(x, ...)
-
-# S3 method for Arima
-sw_glance(x, ...)
-
-# S3 method for Arima
-sw_augment(x, data = NULL, rename_index = "index", timetk_idx = FALSE, ...)
-
-# S3 method for stlm
-sw_tidy(x, ...)
An object of class "Arima"
Additional parameters (not used)
Used with sw_augment
only.
-NULL
by default which simply returns augmented columns only.
-User can supply the original data, which returns the data + augmented columns.
Used with sw_augment
only.
-A string representing the name of the index generated.
Used with sw_augment
only.
-Uses a irregular timetk index if present.
sw_tidy()
returns one row for each coefficient in the model,
-with five columns:
term
: The term in the nonlinear model being estimated and tested
estimate
: The estimated coefficient
sw_glance()
returns one row with the columns
model.desc
: A description of the model including the
-three integer components (p, d, q) are the AR order,
-the degree of differencing, and the MA order.
sigma
: The square root of the estimated residual variance
logLik
: The data's log-likelihood under the model
AIC
: The Akaike Information Criterion
BIC
: The Bayesian Information Criterion
ME
: Mean error
RMSE
: Root mean squared error
MAE
: Mean absolute error
MPE
: Mean percentage error
MAPE
: Mean absolute percentage error
MASE
: Mean absolute scaled error
ACF1
: Autocorrelation of errors at lag 1
sw_augment()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
.actual
: The original time series
.fitted
: The fitted values from the model
.resid
: The residual values from the model
sw_tidy()
returns the underlying ETS or ARIMA model's sw_tidy()
one row for each coefficient in the model, -with five columns:
term
: The term in the nonlinear model being estimated and tested
estimate
: The estimated coefficient
library(dplyr)
-library(forecast)
-
-fit_arima <- WWWusage %>%
- auto.arima()
-
-sw_tidy(fit_arima)
-#> # A tibble: 2 × 2
-#> term estimate
-#> <chr> <dbl>
-#> 1 ar1 0.650
-#> 2 ma1 0.526
-sw_glance(fit_arima)
-#> # A tibble: 1 × 12
-#> model.desc sigma logLik AIC BIC ME RMSE MAE MPE MAPE MASE
-#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 ARIMA(1,1,1) 3.16 -254. 514. 522. 0.304 3.11 2.41 0.281 1.92 0.532
-#> # ℹ 1 more variable: ACF1 <dbl>
-sw_augment(fit_arima)
-#> # A tibble: 100 × 4
-#> index .actual .fitted .resid
-#> <int> <dbl> <dbl> <dbl>
-#> 1 1 88 87.9 0.0880
-#> 2 2 84 86.2 -2.17
-#> 3 3 85 81.1 3.86
-#> 4 4 85 87.5 -2.45
-#> 5 5 84 83.7 0.259
-#> 6 6 85 83.5 1.51
-#> 7 7 83 86.4 -3.44
-#> 8 8 85 79.9 5.11
-#> 9 9 88 89.0 -0.985
-#> 10 10 89 89.4 -0.433
-#> # ℹ 90 more rows
-
-
-
R/tidiers_bats.R
- tidiers_bats.Rd
Tidying methods for BATS and TBATS modeling of time series
-# S3 method for bats
-sw_tidy(x, ...)
-
-# S3 method for bats
-sw_glance(x, ...)
-
-# S3 method for bats
-sw_augment(x, data = NULL, rename_index = "index", timetk_idx = FALSE, ...)
-
-# S3 method for bats
-sw_tidy_decomp(x, timetk_idx = FALSE, rename_index = "index", ...)
An object of class "bats" or "tbats"
Additional parameters (not used)
Used with sw_augment
only.
-NULL
by default which simply returns augmented columns only.
-User can supply the original data, which returns the data + augmented columns.
Used with sw_augment
only.
-A string representing the name of the index generated.
Used with sw_augment
and sw_tidy_decomp
.
-When TRUE
, uses a timetk index (irregular, typically date or datetime) if present.
sw_tidy()
returns one row for each model parameter,
-with two columns:
term
: The various parameters (lambda, alpha, gamma, etc)
estimate
: The estimated parameter value
sw_glance()
returns one row with the columns
model.desc
: A description of the model including the
-three integer components (p, d, q) are the AR order,
-the degree of differencing, and the MA order.
sigma
: The square root of the estimated residual variance
logLik
: The data's log-likelihood under the model
AIC
: The Akaike Information Criterion
BIC
: The Bayesian Information Criterion (NA
for bats / tbats)
ME
: Mean error
RMSE
: Root mean squared error
MAE
: Mean absolute error
MPE
: Mean percentage error
MAPE
: Mean absolute percentage error
MASE
: Mean absolute scaled error
ACF1
: Autocorrelation of errors at lag 1
sw_augment()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
.actual
: The original time series
.fitted
: The fitted values from the model
.resid
: The residual values from the model
sw_tidy_decomp()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
observed
: The original time series
level
: The level component
slope
: The slope component (Not always present)
season
: The seasonal component (Not always present)
library(dplyr)
-library(forecast)
-
-fit_bats <- WWWusage %>%
- bats()
-
-sw_tidy(fit_bats)
-#> # A tibble: 7 × 2
-#> term estimate
-#> <chr> <dbl>
-#> 1 lambda 1.00
-#> 2 alpha 1.52
-#> 3 beta NA
-#> 4 damping.parameter NA
-#> 5 gamma.values NA
-#> 6 ar.coefficients NA
-#> 7 ma.coefficients -0.666
-sw_glance(fit_bats)
-#> # A tibble: 1 × 12
-#> model.desc sigma logLik AIC BIC ME RMSE MAE MPE MAPE MASE ACF1
-#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 BATS(1, {… 3.47 709. 727. 732. 0.217 3.47 2.62 0.261 2.13 0.579 0.0445
-sw_augment(fit_bats)
-#> # A tibble: 100 × 4
-#> index .actual .fitted .resid
-#> <int> <dbl> <dbl> <dbl>
-#> 1 1 88 101. -12.7
-#> 2 2 84 75.6 8.38
-#> 3 3 85 85.5 -0.453
-#> 4 4 85 84.1 0.867
-#> 5 5 84 84.8 -0.818
-#> 6 6 85 82.7 2.30
-#> 7 7 83 86.3 -3.29
-#> 8 8 85 80.2 4.78
-#> 9 9 88 88.1 -0.125
-#> 10 10 89 89.3 -0.284
-#> # ℹ 90 more rows
-
-
R/tidiers_decomposed_ts.R
- tidiers_decomposed_ts.Rd
Tidying methods for decomposed time series
-# S3 method for decomposed.ts
-sw_tidy_decomp(x, timetk_idx = FALSE, rename_index = "index", ...)
An object of class "decomposed.ts"
Used with sw_augment
and sw_tidy_decomp
.
-When TRUE
, uses a timetk index (irregular, typically date or datetime) if present.
Used with sw_augment
and sw_tidy_decomp
.
-A string representing the name of the index generated.
Not used.
sw_tidy_decomp()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
season
: The seasonal component
trend
: The trend component
random
: The error component
seasadj
: observed - season
library(dplyr)
-library(forecast)
-
-fit_decomposed <- USAccDeaths %>%
- decompose()
-
-sw_tidy_decomp(fit_decomposed)
-#> # A tibble: 72 × 6
-#> index observed season trend random seasadj
-#> <yearmon> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 Jan 1973 9007 -806. NA NA 9813.
-#> 2 Feb 1973 8106 -1523. NA NA 9629.
-#> 3 Mar 1973 8928 -741. NA NA 9669.
-#> 4 Apr 1973 9137 -515. NA NA 9652.
-#> 5 May 1973 10017 340. NA NA 9677.
-#> 6 Jun 1973 10826 745. NA NA 10081.
-#> 7 Jul 1973 11317 1679. 9599. 38.2 9638.
-#> 8 Aug 1973 10744 986. 9500. 258. 9758.
-#> 9 Sep 1973 9713 -109. 9416. 406. 9822.
-#> 10 Oct 1973 9938 264. 9349. 325. 9674.
-#> # ℹ 62 more rows
-
-
R/tidiers_ets.R
- tidiers_ets.Rd
Tidying methods for ETS (Error, Trend, Seasonal) exponential smoothing -modeling of time series
-# S3 method for ets
-sw_tidy(x, ...)
-
-# S3 method for ets
-sw_glance(x, ...)
-
-# S3 method for ets
-sw_augment(x, data = NULL, timetk_idx = FALSE, rename_index = "index", ...)
-
-# S3 method for ets
-sw_tidy_decomp(x, timetk_idx = FALSE, rename_index = "index", ...)
An object of class "ets"
Not used.
Used with sw_augment
only.
-NULL
by default which simply returns augmented columns only.
-User can supply the original data, which returns the data + augmented columns.
Used with sw_augment
and sw_tidy_decomp
.
-When TRUE
, uses a timetk index (irregular, typically date or datetime) if present.
Used with sw_augment
and sw_tidy_decomp
.
-A string representing the name of the index generated.
sw_tidy()
returns one row for each model parameter,
-with two columns:
term
: The smoothing parameters (alpha, gamma) and the initial states
-(l, s0 through s10)
estimate
: The estimated parameter value
sw_glance()
returns one row with the columns
model.desc
: A description of the model including the
-three integer components (p, d, q) are the AR order,
-the degree of differencing, and the MA order.
sigma
: The square root of the estimated residual variance
logLik
: The data's log-likelihood under the model
AIC
: The Akaike Information Criterion
BIC
: The Bayesian Information Criterion
ME
: Mean error
RMSE
: Root mean squared error
MAE
: Mean absolute error
MPE
: Mean percentage error
MAPE
: Mean absolute percentage error
MASE
: Mean absolute scaled error
ACF1
: Autocorrelation of errors at lag 1
sw_augment()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
.actual
: The original time series
.fitted
: The fitted values from the model
.resid
: The residual values from the model
sw_tidy_decomp()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
observed
: The original time series
level
: The level component
slope
: The slope component (Not always present)
season
: The seasonal component (Not always present)
library(dplyr)
-library(forecast)
-
-fit_ets <- WWWusage %>%
- ets()
-
-sw_tidy(fit_ets)
-#> # A tibble: 5 × 2
-#> term estimate
-#> <chr> <dbl>
-#> 1 alpha 1.00
-#> 2 beta 0.997
-#> 3 phi 0.815
-#> 4 l 90.4
-#> 5 b -0.0173
-sw_glance(fit_ets)
-#> # A tibble: 1 × 12
-#> model.desc sigma logLik AIC BIC ME RMSE MAE MPE MAPE MASE ACF1
-#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 ETS(A,Ad,N) 3.50 -353. 718. 733. 0.224 3.41 2.76 0.263 2.16 0.610 0.231
-sw_augment(fit_ets)
-#> # A tibble: 100 × 4
-#> index .actual .fitted .resid
-#> <int> <dbl> <dbl> <dbl>
-#> 1 1 88 90.3 -2.34
-#> 2 2 84 86.1 -2.09
-#> 3 3 85 80.7 4.25
-#> 4 4 85 85.8 -0.803
-#> 5 5 84 85.0 -1.00
-#> 6 6 85 83.2 1.81
-#> 7 7 83 85.8 -2.81
-#> 8 8 85 81.4 3.62
-#> 9 9 88 86.6 1.38
-#> 10 10 89 90.4 -1.44
-#> # ℹ 90 more rows
-sw_tidy_decomp(fit_ets)
-#> # A tibble: 101 × 4
-#> index observed level slope
-#> <dbl> <dbl> <dbl> <dbl>
-#> 1 0 NA 90.4 -0.0173
-#> 2 1 88 88.0 -2.34
-#> 3 2 84 84.0 -3.99
-#> 4 3 85 85.0 0.986
-#> 5 4 85 85.0 0.00312
-#> 6 5 84 84.0 -0.997
-#> 7 6 85 85.0 0.994
-#> 8 7 83 83.0 -1.99
-#> 9 8 85 85.0 1.99
-#> 10 9 88 88.0 3.00
-#> # ℹ 91 more rows
-
-
R/tidiers_nnetar.R
- tidiers_nnetar.Rd
These methods tidy the coefficients of NNETAR models of univariate time -series.
-# S3 method for nnetar
-sw_tidy(x, ...)
-
-# S3 method for nnetar
-sw_glance(x, ...)
-
-# S3 method for nnetar
-sw_augment(x, data = NULL, timetk_idx = FALSE, rename_index = "index", ...)
An object of class "nnetar"
Additional parameters (not used)
Used with sw_augment
only.
-NULL
by default which simply returns augmented columns only.
-User can supply the original data, which returns the data + augmented columns.
Used with sw_augment
only.
-Uses a irregular timetk index if present.
Used with sw_augment
only.
-A string representing the name of the index generated.
sw_tidy()
returns one row for each model parameter,
-with two columns:
term
: The smoothing parameters (alpha, gamma) and the initial states
-(l, s0 through s10)
estimate
: The estimated parameter value
sw_glance()
returns one row with the columns
model.desc
: A description of the model including the
-three integer components (p, d, q) are the AR order,
-the degree of differencing, and the MA order.
sigma
: The square root of the estimated residual variance
logLik
: The data's log-likelihood under the model (NA
)
AIC
: The Akaike Information Criterion (NA
)
BIC
: The Bayesian Information Criterion (NA
)
ME
: Mean error
RMSE
: Root mean squared error
MAE
: Mean absolute error
MPE
: Mean percentage error
MAPE
: Mean absolute percentage error
MASE
: Mean absolute scaled error
ACF1
: Autocorrelation of errors at lag 1
sw_augment()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
.actual
: The original time series
.fitted
: The fitted values from the model
.resid
: The residual values from the model
library(dplyr)
-library(forecast)
-
-fit_nnetar <- lynx %>%
- nnetar()
-
-sw_tidy(fit_nnetar)
-#> # A tibble: 4 × 2
-#> term estimate
-#> <chr> <dbl>
-#> 1 m 1
-#> 2 p 8
-#> 3 P 0
-#> 4 size 4
-sw_glance(fit_nnetar)
-#> # A tibble: 1 × 12
-#> model.desc sigma logLik AIC BIC ME RMSE MAE MPE MAPE MASE
-#> <chr> <dbl> <lgl> <lgl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 NNAR(8,4) 297. NA NA NA -0.0400 297. 218. -39.6 54.0 0.262
-#> # ℹ 1 more variable: ACF1 <dbl>
-sw_augment(fit_nnetar)
-#> # A tibble: 114 × 4
-#> index .actual .fitted .resid
-#> <dbl> <dbl> <dbl> <dbl>
-#> 1 1821 269 NA NA
-#> 2 1822 321 NA NA
-#> 3 1823 585 NA NA
-#> 4 1824 871 NA NA
-#> 5 1825 1475 NA NA
-#> 6 1826 2821 NA NA
-#> 7 1827 3928 NA NA
-#> 8 1828 5943 NA NA
-#> 9 1829 4950 4798. 152.
-#> 10 1830 2577 2613. -35.7
-#> # ℹ 104 more rows
-
-
R/tidiers_robets.R
- tidiers_robets.Rd
Tidying methods for robets (Robust Error, Trend, Seasonal) exponential smoothing -modeling of time series
-# S3 method for robets -sw_tidy(x, ...) - -# S3 method for robets -sw_glance(x, ...) - -# S3 method for robets -sw_augment(x, data = NULL, timetk_idx = FALSE, rename_index = "index", ...) - -# S3 method for robets -sw_tidy_decomp(x, timetk_idx = FALSE, rename_index = "index", ...)- -
x | -An object of class "robets" |
-
---|---|
... | -Not used. |
-
data | -Used with |
-
timetk_idx | -Used with |
-
rename_index | -Used with |
-
sw_tidy()
returns one row for each model parameter,
-with two columns:
term
: The smoothing parameters (alpha, gamma) and the initial states
-(l, s0 through s10)
estimate
: The estimated parameter value
sw_glance()
returns one row with the columns
model.desc
: A description of the model including the
-three integer components (p, d, q) are the AR order,
-the degree of differencing, and the MA order.
sigma
: The square root of the estimated residual variance
logLik
: The data's log-likelihood under the model
AIC
: The Akaike Information Criterion
BIC
: The Bayesian Information Criterion
ME
: Mean error
RMSE
: Root mean squared error
MAE
: Mean absolute error
MPE
: Mean percentage error
MAPE
: Mean absolute percentage error
MASE
: Mean absolute scaled error
ACF1
: Autocorrelation of errors at lag 1
sw_augment()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
.actual
: The original time series
.fitted
: The fitted values from the model
.resid
: The residual values from the model
sw_tidy_decomp()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
observed
: The original time series
level
: The level component
slope
: The slope component (Not always present)
season
: The seasonal component (Not always present)
-library(dplyr) -library(robets) -library(sweep) - -fit_robets <- WWWusage %>% - robets() - -sw_tidy(fit_robets)#> # A tibble: 7 x 2 -#> term estimate -#> <chr> <dbl> -#> 1 alpha 0.997 -#> 2 beta 0.980 -#> 3 phi 0.800 -#> 4 sigma0 1.48 -#> 5 initstate.l 85 -#> 6 initstate.b 0 -#> 7 k 3sw_glance(fit_robets)#> # A tibble: 1 x 2 -#> model.desc sigma -#> <chr> <dbl> -#> 1 ROBETS(A,Ad,N) 3.47sw_augment(fit_robets)#> # A tibble: 100 x 4 -#> index .actual .fitted .resid -#> <int> <dbl> <dbl> <dbl> -#> 1 1 88 85 3 -#> 2 2 84 90.3 -6.34 -#> 3 3 85 82.1 2.90 -#> 4 4 85 85.2 -0.205 -#> 5 5 84 85.0 -1.01 -#> 6 6 85 83.2 1.78 -#> 7 7 83 85.8 -2.76 -#> 8 8 85 81.5 3.54 -#> 9 9 88 86.5 1.47 -#> 10 10 89 90.4 -1.38 -#> # … with 90 more rowssw_tidy_decomp(fit_robets)#> # A tibble: 101 x 4 -#> index observed level slope -#> <dbl> <dbl> <dbl> <dbl> -#> 1 0 NA 1.48 0 -#> 2 1 88 1.65 2.94 -#> 3 2 84 1.90 -3.22 -#> 4 3 85 2.03 0.267 -#> 5 4 85 1.93 0.0130 -#> 6 5 84 1.86 -0.980 -#> 7 6 85 1.88 0.961 -#> 8 7 83 2.00 -1.94 -#> 9 8 85 2.19 1.92 -#> 10 9 88 2.14 2.98 -#> # … with 91 more rows-
R/tidiers_stl.R
- tidiers_stl.Rd
Tidying methods for STL (Seasonal, Trend, Level) decomposition of time series
-# S3 method for stl
-sw_tidy(x, ...)
-
-# S3 method for stl
-sw_tidy_decomp(x, timetk_idx = FALSE, rename_index = "index", ...)
-
-# S3 method for stlm
-sw_tidy_decomp(x, timetk_idx = FALSE, rename_index = "index", ...)
-
-# S3 method for stlm
-sw_glance(x, ...)
-
-# S3 method for stlm
-sw_augment(x, data = NULL, rename_index = "index", timetk_idx = FALSE, ...)
An object of class "stl"
Not used.
Used with sw_tidy_decomp
.
-When TRUE
, uses a timetk index (irregular, typically date or datetime) if present.
Used with sw_tidy_decomp
.
-A string representing the name of the index generated.
Used with sw_augment
only.
sw_tidy()
wraps sw_tidy_decomp()
sw_tidy_decomp()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
season
: The seasonal component
trend
: The trend component
remainder
: observed - (season + trend)
seasadj
: observed - season (or trend + remainder)
sw_glance()
returns the underlying ETS or ARIMA model's sw_glance()
results one row with the columns
model.desc
: A description of the model including the
-three integer components (p, d, q) are the AR order,
-the degree of differencing, and the MA order.
sigma
: The square root of the estimated residual variance
logLik
: The data's log-likelihood under the model
AIC
: The Akaike Information Criterion
BIC
: The Bayesian Information Criterion
ME
: Mean error
RMSE
: Root mean squared error
MAE
: Mean absolute error
MPE
: Mean percentage error
MAPE
: Mean absolute percentage error
MASE
: Mean absolute scaled error
ACF1
: Autocorrelation of errors at lag 1
sw_augment()
returns a tibble with the following time series attributes:
index
: An index is either attempted to be extracted from the model or
-a sequential index is created for plotting purposes
.actual
: The original time series
.fitted
: The fitted values from the model
.resid
: The residual values from the model
library(dplyr)
-library(forecast)
-library(sweep)
-
-fit_stl <- USAccDeaths %>%
- stl(s.window = "periodic")
-
-sw_tidy_decomp(fit_stl)
-#> # A tibble: 72 × 6
-#> index observed season trend remainder seasadj
-#> <yearmon> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 Jan 1973 9007 -820. 9935. -108. 9827.
-#> 2 Feb 1973 8106 -1559. 9881. -216. 9665.
-#> 3 Mar 1973 8928 -760. 9827. -139. 9688.
-#> 4 Apr 1973 9137 -530. 9766. -98.2 9667.
-#> 5 May 1973 10017 335. 9704. -22.0 9682.
-#> 6 Jun 1973 10826 815. 9637. 374. 10011.
-#> 7 Jul 1973 11317 1682. 9569. 65.9 9635.
-#> 8 Aug 1973 10744 982. 9500. 262. 9762.
-#> 9 Sep 1973 9713 -62.8 9431. 345. 9776.
-#> 10 Oct 1973 9938 232. 9343. 363. 9706.
-#> # ℹ 62 more rows
-
-
R/utils-broom.R
- validate_index.Rd
Validates data frame has column named the same name as variable rename_index
-