diff --git a/DESCRIPTION b/DESCRIPTION index b1ce379bf..839292fe4 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Type: Package Package: performance Title: Assessment of Regression Models Performance -Version: 0.12.4.7 +Version: 0.12.4.16 Authors@R: c(person(given = "Daniel", family = "Lüdecke", @@ -74,7 +74,7 @@ Depends: R (>= 3.6) Imports: bayestestR (>= 0.15.0), - insight (>= 0.20.5), + insight (>= 1.0.0), datawizard (>= 0.13.0), stats, utils @@ -160,4 +160,4 @@ Config/Needs/website: r-lib/pkgdown, easystats/easystatstemplate Config/rcmdcheck/ignore-inconsequential-notes: true -Remotes: easystats/insight +Remotes: easystats/datawizard, easystats/see, easystats/insight diff --git a/NAMESPACE b/NAMESPACE index d35d82547..84d39b78c 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -25,7 +25,8 @@ S3method(as.data.frame,performance_score) S3method(as.data.frame,r2_bayes) S3method(as.data.frame,r2_loo) S3method(as.data.frame,r2_nakagawa) -S3method(as.numeric,check_outliers) +S3method(as.double,check_outliers) +S3method(as.double,performance_roc) S3method(check_autocorrelation,default) S3method(check_collinearity,BFBayesFactor) S3method(check_collinearity,MixMod) @@ -473,6 +474,7 @@ S3method(r2_mcfadden,clm) S3method(r2_mcfadden,clm2) S3method(r2_mcfadden,cpglm) S3method(r2_mcfadden,glm) +S3method(r2_mcfadden,glmmTMB) S3method(r2_mcfadden,glmx) S3method(r2_mcfadden,logitmfx) S3method(r2_mcfadden,logitor) diff --git a/NEWS.md b/NEWS.md index 1bddce2ce..85cb6eba5 100644 --- a/NEWS.md +++ b/NEWS.md @@ -2,6 +2,9 @@ ## Breaking changes +* `check_outliers()` with `method = "optics"` now returns a further refined + cluster selection, by passing the `optics_xi` argument to `dbscan::extractXi()`. + * Deprecated arguments and alias-function-names have been removed. * Argument names in `check_model()` that refer to plot-aesthetics (like @@ -13,6 +16,22 @@ * Increased accuracy for `check_convergence()` for *glmmTMB* models. +* `r2()` and `r2_mcfadden()` now support beta-binomial (non-mixed) models from + package *glmmTMB*. + +* An `as.numeric()` resp. `as.double()` method for objects of class + `performance_roc` was added. + +* Improved documentation for `performance_roc()`. + +## Bug fixes + +* `check_outliers()` did not warn that no numeric variables were found when only + the response variable was numeric, but all relevant predictors were not. + +* `check_collinearity()` did not work for glmmTMB models when zero-inflation + component was set to `~0`. + # performance 0.12.4 ## Changes diff --git a/R/check_autocorrelation.R b/R/check_autocorrelation.R index 159f53843..d94f788f1 100644 --- a/R/check_autocorrelation.R +++ b/R/check_autocorrelation.R @@ -49,7 +49,6 @@ check_autocorrelation.default <- function(x, nsim = 1000, ...) { } - # methods ------------------------------ #' @export @@ -81,7 +80,6 @@ print.check_autocorrelation <- function(x, ...) { } - # utilities ------------------------------- .durbWats <- function(.residuals) { diff --git a/R/check_clusterstructure.R b/R/check_clusterstructure.R index 9d4bab2d6..fd275cf5d 100644 --- a/R/check_clusterstructure.R +++ b/R/check_clusterstructure.R @@ -75,7 +75,6 @@ check_clusterstructure <- function(x, } - #' @export plot.check_clusterstructure <- function(x, ...) { # Can be reimplemented with ggplot in see @@ -88,7 +87,6 @@ plot.check_clusterstructure <- function(x, ...) { } - #' @keywords internal .clusterstructure_dm <- function(x, distance = "euclidean", method = "ward.D2") { d <- stats::dist(x, method = distance) @@ -97,7 +95,6 @@ plot.check_clusterstructure <- function(x, ...) { } - #' @keywords internal .clusterstructure_hopkins <- function(x, distance = "euclidean") { # This is based on the hopkins() function from the clustertend package diff --git a/R/check_collinearity.R b/R/check_collinearity.R index a7e1299c1..6d34c6afa 100644 --- a/R/check_collinearity.R +++ b/R/check_collinearity.R @@ -73,7 +73,12 @@ #' This portion of multicollinearity among the component terms of an #' interaction is also called "inessential ill-conditioning", which leads to #' inflated VIF values that are typically seen for models with interaction -#' terms _(Francoeur 2013)_. +#' terms _(Francoeur 2013)_. Centering interaction terms can resolve this +#' issue _(Kim and Jung 2024)_. +#' +#' @section Multicollinearity and Polynomial Terms: +#' Polynomial transformations are considered a single term and thus VIFs are +#' not calculated between them. #' #' @section Concurvity for Smooth Terms in Generalized Additive Models: #' `check_concurvity()` is a wrapper around `mgcv::concurvity()`, and can be @@ -91,26 +96,30 @@ #' @references #' #' - Francoeur, R. B. (2013). Could Sequential Residual Centering Resolve -#' Low Sensitivity in Moderated Regression? Simulations and Cancer Symptom -#' Clusters. Open Journal of Statistics, 03(06), 24-44. +#' Low Sensitivity in Moderated Regression? Simulations and Cancer Symptom +#' Clusters. Open Journal of Statistics, 03(06), 24-44. +#' +#' - James, G., Witten, D., Hastie, T., and Tibshirani, R. (eds.). (2013). An +#' introduction to statistical learning: with applications in R. New York: +#' Springer. #' -#' - James, G., Witten, D., Hastie, T., and Tibshirani, R. (eds.). (2013). -#' An introduction to statistical learning: with applications in R. New York: -#' Springer. +#' - Kim, Y., & Jung, G. (2024). Understanding linear interaction analysis with +#' causal graphs. British Journal of Mathematical and Statistical Psychology, +#' 00, 1–14. #' #' - Marcoulides, K. M., and Raykov, T. (2019). Evaluation of Variance -#' Inflation Factors in Regression Models Using Latent Variable Modeling -#' Methods. Educational and Psychological Measurement, 79(5), 874–882. +#' Inflation Factors in Regression Models Using Latent Variable Modeling +#' Methods. Educational and Psychological Measurement, 79(5), 874–882. #' #' - McElreath, R. (2020). Statistical rethinking: A Bayesian course with -#' examples in R and Stan. 2nd edition. Chapman and Hall/CRC. +#' examples in R and Stan. 2nd edition. Chapman and Hall/CRC. #' #' - Vanhove, J. (2019). Collinearity isn't a disease that needs curing. -#' [webpage](https://janhove.github.io/posts/2019-09-11-collinearity/) +#' [webpage](https://janhove.github.io/posts/2019-09-11-collinearity/) #' #' - Zuur AF, Ieno EN, Elphick CS. A protocol for data exploration to avoid -#' common statistical problems: Data exploration. Methods in Ecology and -#' Evolution (2010) 1:3–14. +#' common statistical problems: Data exploration. Methods in Ecology and +#' Evolution (2010) 1:3–14. #' #' @family functions to check model assumptions and and assess model quality #' @@ -139,7 +148,6 @@ check_collinearity <- function(x, ...) { multicollinearity <- check_collinearity - # default ------------------------------ #' @rdname check_collinearity @@ -150,7 +158,6 @@ check_collinearity.default <- function(x, ci = 0.95, verbose = TRUE, ...) { } - # methods ------------------------------------------- #' @export @@ -193,7 +200,7 @@ plot.check_collinearity <- function(x, ...) { x <- insight::format_table(x) x <- datawizard::data_rename( x, - pattern = "SE_factor", + select = "SE_factor", replacement = "Increased SE", verbose = FALSE ) @@ -218,7 +225,6 @@ plot.check_collinearity <- function(x, ...) { } - # other classes ---------------------------------- #' @export @@ -294,7 +300,6 @@ check_collinearity.betaor <- check_collinearity.logitor check_collinearity.betamfx <- check_collinearity.logitor - # zi-models ------------------------------------- #' @rdname check_collinearity @@ -353,7 +358,6 @@ check_collinearity.zerocount <- function(x, } - # utilities --------------------------------- .check_collinearity_zi_model <- function(x, component, ci = 0.95, verbose = TRUE) { @@ -403,9 +407,8 @@ check_collinearity.zerocount <- function(x, } - .check_collinearity <- function(x, component, ci = 0.95, verbose = TRUE) { - v <- insight::get_varcov(x, component = component, verbose = FALSE) + v <- .safe(insight::get_varcov(x, component = component, verbose = FALSE)) # sanity check if (is.null(v)) { @@ -514,7 +517,7 @@ check_collinearity.zerocount <- function(x, if (!is.null(insight::find_interactions(x)) && any(result > 10) && isTRUE(verbose)) { insight::format_alert( "Model has interaction terms. VIFs might be inflated.", - "You may check multicollinearity among predictors of a model without interaction terms." + "Try to center the variables used for the interaction, or check multicollinearity among predictors of a model without interaction terms." # nolint ) } @@ -577,7 +580,6 @@ check_collinearity.zerocount <- function(x, } - .term_assignments <- function(x, component, verbose = TRUE) { tryCatch( { @@ -613,7 +615,6 @@ check_collinearity.zerocount <- function(x, } - .find_term_assignment <- function(x, component, verbose = TRUE) { pred <- insight::find_predictors(x)[[component]] @@ -648,7 +649,6 @@ check_collinearity.zerocount <- function(x, } - .zi_term_assignment <- function(x, component = "zero_inflated", verbose = TRUE) { tryCatch( { diff --git a/R/check_concurvity.R b/R/check_concurvity.R index a72ba680d..d508bc8b1 100644 --- a/R/check_concurvity.R +++ b/R/check_concurvity.R @@ -25,7 +25,6 @@ check_concurvity.gam <- function(x, ...) { } - # methods --------------------------------- #' @export diff --git a/R/check_distribution.R b/R/check_distribution.R index d743b3ac1..9ab4d24de 100644 --- a/R/check_distribution.R +++ b/R/check_distribution.R @@ -61,7 +61,6 @@ check_distribution <- function(model) { } - # default ----------------------------- #' @export @@ -106,7 +105,6 @@ check_distribution.default <- function(model) { } - # methods -------------------------- #' @export @@ -160,7 +158,6 @@ plot.check_distribution_numeric <- function(x, ...) { } - # other classes ------------------- #' @export @@ -184,7 +181,6 @@ check_distribution.numeric <- function(model) { } - # utilities ----------------------------- .extract_features <- function(x, type = NULL) { @@ -243,7 +239,6 @@ check_distribution.numeric <- function(model) { } - .is_integer <- function(x) { tryCatch( ifelse(is.infinite(x), FALSE, x %% 1 == 0), diff --git a/R/check_factorstructure.R b/R/check_factorstructure.R index 21f4584c1..c7c323cbd 100644 --- a/R/check_factorstructure.R +++ b/R/check_factorstructure.R @@ -114,9 +114,6 @@ check_factorstructure <- function(x, n = NULL, ...) { } - - - #' @rdname check_factorstructure #' @export check_kmo <- function(x, n = NULL, ...) { @@ -169,9 +166,6 @@ check_kmo <- function(x, n = NULL, ...) { } - - - #' @rdname check_factorstructure #' @export check_sphericity_bartlett <- function(x, n = NULL, ...) { @@ -213,7 +207,6 @@ check_sphericity_bartlett <- function(x, n = NULL, ...) { } - # Helpers ----------------------------------------------------------------- #' @keywords internal diff --git a/R/check_heterogeneity_bias.R b/R/check_heterogeneity_bias.R index 2414873cc..ea7f1423b 100644 --- a/R/check_heterogeneity_bias.R +++ b/R/check_heterogeneity_bias.R @@ -102,8 +102,6 @@ check_heterogeneity_bias <- function(x, select = NULL, by = NULL, nested = FALSE } - - #' @export print.check_heterogeneity_bias <- function(x, ...) { cat("Possible heterogeneity bias due to following predictors: ") diff --git a/R/check_heteroscedasticity.R b/R/check_heteroscedasticity.R index 3fbefec9b..ad97d8871 100644 --- a/R/check_heteroscedasticity.R +++ b/R/check_heteroscedasticity.R @@ -79,7 +79,6 @@ check_heteroscedasticity.default <- function(x, ...) { } - # methods ----------------------- #' @export @@ -112,7 +111,6 @@ plot.check_heteroscedasticity <- function(x, ...) { } - # utilities ----------------------- .sigma <- function(x) { diff --git a/R/check_homogeneity.R b/R/check_homogeneity.R index d6b486810..43215da62 100644 --- a/R/check_homogeneity.R +++ b/R/check_homogeneity.R @@ -36,7 +36,6 @@ check_homogeneity <- function(x, method = c("bartlett", "fligner", "levene", "au } - # default ------------------------- #' @export @@ -105,7 +104,6 @@ check_homogeneity.default <- function(x, method = c("bartlett", "fligner", "leve } - # methods ----------------------- #' @export @@ -130,7 +128,6 @@ plot.check_homogeneity <- function(x, ...) { } - # other classes ----------------------------- #' @rdname check_homogeneity diff --git a/R/check_htest.R b/R/check_htest.R index f5d085839..4dadf3a7a 100644 --- a/R/check_htest.R +++ b/R/check_htest.R @@ -153,7 +153,6 @@ check_symmetry.htest <- function(x, ...) { # } - # Print ------------------------------------------------------------------- #' @export diff --git a/R/check_itemscale.R b/R/check_itemscale.R index 092ee0a3d..a7973ecfc 100644 --- a/R/check_itemscale.R +++ b/R/check_itemscale.R @@ -137,7 +137,6 @@ check_itemscale <- function(x, factor_index = NULL) { } - # methods ------------------------------------- #' @export diff --git a/R/check_model.R b/R/check_model.R index 11562a351..b096b0f25 100644 --- a/R/check_model.R +++ b/R/check_model.R @@ -183,7 +183,6 @@ check_model <- function(x, ...) { } - # default ---------------------------- #' @rdname check_model @@ -313,7 +312,6 @@ plot.check_model <- function(x, ...) { } - # other classes --------------------------- ## TODO for now, convert to freq, see https://github.com/easystats/performance/issues/354 @@ -454,7 +452,6 @@ check_model.performance_simres <- function(x, check_model.DHARMa <- check_model.performance_simres - # compile plots for checks of linear models ------------------------ .check_assumptions_linear <- function(model, model_info, check = "all", residual_type = "normal", verbose = TRUE, ...) { @@ -515,7 +512,6 @@ check_model.DHARMa <- check_model.performance_simres } - # compile plots for checks of generalized linear models ------------------------ .check_assumptions_glm <- function(model, model_info, check = "all", residual_type = "simulated", verbose = TRUE, ...) { @@ -576,7 +572,6 @@ check_model.DHARMa <- check_model.performance_simres } - # compile plots for checks of Bayesian models ------------------------ .check_assumptions_stan <- function(model, ...) { diff --git a/R/check_model_diagnostics.R b/R/check_model_diagnostics.R index 5f035b4a0..971a72dde 100644 --- a/R/check_model_diagnostics.R +++ b/R/check_model_diagnostics.R @@ -10,16 +10,9 @@ dat$group[dat$VIF >= 5 & dat$VIF < 10] <- "moderate" dat$group[dat$VIF >= 10] <- "high" - dat <- datawizard::data_rename( - dat, - c("Term", "VIF", "SE_factor", "Component"), - c("x", "y", "se", "facet"), - verbose = FALSE - ) - dat <- datawizard::data_select( dat, - c("x", "y", "facet", "group"), + select = c(x = "Term", y = "VIF", facet = "Component", group = "group"), verbose = FALSE ) @@ -32,7 +25,6 @@ } - # prepare data for QQ plot ---------------------------------- .model_diagnostic_qq <- function(model, model_info = NULL, verbose = TRUE) { @@ -95,7 +87,6 @@ } - # prepare data for random effects QQ plot ---------------------------------- .model_diagnostic_ranef_qq <- function(model, level = 0.95, model_info = NULL, verbose = TRUE) { @@ -158,7 +149,6 @@ } - # prepare data for normality of residuals plot ---------------------------------- .model_diagnostic_normality <- function(model, verbose = TRUE) { @@ -178,7 +168,6 @@ } - # prepare data for influential obs plot ---------------------------------- .model_diagnostic_outlier <- function(model, threshold = NULL) { @@ -217,7 +206,6 @@ } - # prepare data for non-constant variance plot ---------------------------------- .model_diagnostic_ncv <- function(model, verbose = TRUE) { @@ -245,7 +233,6 @@ } - # prepare data for homogeneity of variance plot ---------------------------------- .model_diagnostic_homogeneity <- function(model, verbose = TRUE) { @@ -290,7 +277,6 @@ } - # prepare data for homogeneity of variance plot ---------------------------------- .new_diag_overdispersion <- function(model, ...) { @@ -376,7 +362,6 @@ } - .model_diagnostic_overdispersion <- function(model, ...) { ## TODO: remove this code later -it's just to test the ".new_diag_overdispersion" ## function. Set options(performance_new_overdispersion = TRUE) to use the new diff --git a/R/check_multimodal.R b/R/check_multimodal.R index b48a5831e..9d302c931 100644 --- a/R/check_multimodal.R +++ b/R/check_multimodal.R @@ -91,7 +91,6 @@ check_multimodal.data.frame <- function(x, ...) { } - #' @export check_multimodal.numeric <- function(x, ...) { insight::check_if_installed("multimode") diff --git a/R/check_normality.R b/R/check_normality.R index ffd4ba824..9dfcf9cbf 100644 --- a/R/check_normality.R +++ b/R/check_normality.R @@ -50,7 +50,6 @@ check_normality <- function(x, ...) { } - # default ------------------------- #' @export @@ -184,7 +183,6 @@ print.check_normality <- function(x, ...) { } - # other classes -------------------- # mixed models --------------------- @@ -276,7 +274,6 @@ check_normality.afex_aov <- function(x, ...) { check_normality.BFBayesFactor <- check_normality.afex_aov - # helper --------------------- .check_normality <- function(x, model, type = "residuals") { diff --git a/R/check_outliers.R b/R/check_outliers.R index 7729f9931..91a9e491c 100644 --- a/R/check_outliers.R +++ b/R/check_outliers.R @@ -198,7 +198,8 @@ #' extreme values), this algorithm functions in a different manner and won't #' always detect outliers. Note that `method = "optics"` requires the #' **dbscan** package to be installed, and that it takes some time to compute -#' the results. +#' the results. Additionally, the `optics_xi` (default to 0.05) is passed to +#' the [dbscan::extractXi()] function to further refine the cluster selection. #' #' - **Local Outlier Factor**: #' Based on a K nearest neighbors algorithm, LOF compares the local density of @@ -242,6 +243,7 @@ #' mcd = stats::qchisq(p = 1 - 0.001, df = ncol(x)), #' ics = 0.001, #' optics = 2 * ncol(x), +#' optics_xi = 0.05, #' lof = 0.001 #' ) #' ``` @@ -378,42 +380,23 @@ check_outliers.default <- function(x, # Check args if (all(method == "all")) { method <- c( - "zscore_robust", - "iqr", - "ci", - "cook", - "pareto", - "mahalanobis", - "mahalanobis_robust", - "mcd", - "ics", - "optics", - "lof" + "zscore_robust", "iqr", "ci", "cook", "pareto", "mahalanobis", + "mahalanobis_robust", "mcd", "ics", "optics", "lof" ) } method <- match.arg( method, c( - "zscore", - "zscore_robust", - "iqr", - "ci", - "hdi", - "eti", - "bci", - "cook", - "pareto", - "mahalanobis", - "mahalanobis_robust", - "mcd", - "ics", - "optics", - "lof" + "zscore", "zscore_robust", "iqr", "ci", "hdi", "eti", "bci", "cook", + "pareto", "mahalanobis", "mahalanobis_robust", "mcd", "ics", "optics", "lof" ), several.ok = TRUE ) + # Get model information + m_info <- insight::model_info(x) + # Get data my_data <- insight::get_data(x, verbose = FALSE) @@ -427,8 +410,17 @@ check_outliers.default <- function(x, ) } - # Remove non-numerics - my_data <- datawizard::data_select(my_data, select = is.numeric, verbose = FALSE) + # Remove non-numerics, but in case of binomial, only check predictors + if (m_info$is_binomial) { + model_predictors <- unique(insight::find_predictors(x, flatten = TRUE)) + } else { + model_predictors <- colnames(my_data) + } + my_data <- datawizard::data_select( + my_data[model_predictors], + select = is.numeric, + verbose = FALSE + ) # check if any data left if (is.null(my_data) || ncol(my_data) == 0) { @@ -468,7 +460,7 @@ check_outliers.default <- function(x, } # Cook - if ("cook" %in% method && !insight::model_info(x)$is_bayesian && !inherits(x, "bife")) { + if ("cook" %in% method && !m_info$is_bayesian && !inherits(x, "bife")) { data_cook <- .check_outliers_cook( x, threshold = thresholds$cook @@ -508,7 +500,7 @@ check_outliers.default <- function(x, } # Pareto - if ("pareto" %in% method && insight::model_info(x)$is_bayesian) { + if ("pareto" %in% method && m_info$is_bayesian) { data_pareto <- .check_outliers_pareto( x, threshold = thresholds$pareto @@ -598,7 +590,6 @@ check_outliers.default <- function(x, } - # Methods ----------------------------------------------------------------- #' @export @@ -607,7 +598,7 @@ as.data.frame.check_outliers <- function(x, ...) { } #' @export -as.numeric.check_outliers <- function(x, ...) { +as.double.check_outliers <- function(x, ...) { attributes(x)$data$Outlier } @@ -832,7 +823,6 @@ print.check_outliers_simres <- function(x, digits = 2, ...) { } - # other classes ------------------------- #' @rdname check_outliers @@ -851,7 +841,6 @@ check_outliers.numeric <- function(x, } - #' @rdname check_outliers #' @export check_outliers.data.frame <- function(x, @@ -891,6 +880,13 @@ check_outliers.data.frame <- function(x, } else if (is.numeric(threshold)) { thresholds <- .check_outliers_thresholds(x) thresholds <- lapply(thresholds, function(x) threshold) + # need to fix this manually - "optics" automatically includes method + # "optics_xi", which is allowed to range between 0 and 1 - since values + # for "optics" can be > 1, it might overwrite "optics_xi" with an invalid + # value... + if (thresholds$optics_xi > 1) { + thresholds$optics_xi <- 0.05 + } } else { insight::format_error( paste( @@ -900,7 +896,13 @@ check_outliers.data.frame <- function(x, ) } - thresholds <- thresholds[names(thresholds) %in% method] + # Keep only relevant threshold + valid <- method + if ("optics" %in% valid) { + valid <- c(valid, "optics_xi") + method <- c(method, "optics_xi") + } + thresholds <- thresholds[names(thresholds) %in% valid] out.meta <- .check_outliers.data.frame_method(x, method, thresholds, ID, ID.names, ...) out <- out.meta$out @@ -1217,7 +1219,8 @@ check_outliers.data.frame <- function(x, out <- c(out, .check_outliers_optics( x, threshold = thresholds$optics, - ID.names = ID.names + ID.names = ID.names, + xi = thresholds$optics_xi )) count.table <- datawizard::data_filter( @@ -1371,7 +1374,6 @@ check_outliers.BFBayesFactor <- function(x, } - #' @export check_outliers.gls <- function(x, method = "pareto", @@ -1508,7 +1510,6 @@ check_outliers.performance_simres <- function(x, type = "default", iterations = check_outliers.DHARMa <- check_outliers.performance_simres - # Thresholds -------------------------------------------------------------- .check_outliers_thresholds <- function(x) { @@ -1516,43 +1517,27 @@ check_outliers.DHARMa <- check_outliers.performance_simres } .check_outliers_thresholds_nowarn <- function(x) { - zscore <- stats::qnorm(p = 1 - 0.001 / 2) - zscore_robust <- stats::qnorm(p = 1 - 0.001 / 2) - iqr <- 1.7 - ci <- 1 - 0.001 - eti <- 1 - 0.001 - hdi <- 1 - 0.001 - bci <- 1 - 0.001 - cook <- stats::qf(0.5, ncol(x), nrow(x) - ncol(x)) - pareto <- 0.7 - mahalanobis_value <- stats::qchisq(p = 1 - 0.001, df = ncol(x)) - mahalanobis_robust <- stats::qchisq(p = 1 - 0.001, df = ncol(x)) - mcd <- stats::qchisq(p = 1 - 0.001, df = ncol(x)) - ics <- 0.001 - optics <- 2 * ncol(x) - lof <- 0.001 - list( - zscore = zscore, - zscore_robust = zscore_robust, - iqr = iqr, - ci = ci, - hdi = hdi, - eti = eti, - bci = bci, - cook = cook, - pareto = pareto, - mahalanobis = mahalanobis_value, - mahalanobis_robust = mahalanobis_robust, - mcd = mcd, - ics = ics, - optics = optics, - lof = lof + zscore = stats::qnorm(p = 1 - 0.001 / 2), + zscore_robust = stats::qnorm(p = 1 - 0.001 / 2), + iqr = 1.7, + ci = 1 - 0.001, + hdi = 1 - 0.001, + eti = 1 - 0.001, + bci = 1 - 0.001, + cook = stats::qf(0.5, ncol(x), nrow(x) - ncol(x)), + pareto = 0.7, + mahalanobis = stats::qchisq(p = 1 - 0.001, df = ncol(x)), + mahalanobis_robust = stats::qchisq(p = 1 - 0.001, df = ncol(x)), + mcd = stats::qchisq(p = 1 - 0.001, df = ncol(x)), + ics = 0.001, + optics = 2 * ncol(x), + optics_xi = 0.05, + lof = 0.001 ) } - # utilities -------------------- .check_outliers_zscore <- function(x, @@ -1611,7 +1596,6 @@ check_outliers.DHARMa <- check_outliers.performance_simres } - .check_outliers_iqr <- function(x, threshold = 1.7, method = "tukey", @@ -1663,7 +1647,6 @@ check_outliers.DHARMa <- check_outliers.performance_simres } - .check_outliers_ci <- function(x, threshold = 1 - 0.001, method = "ci", @@ -1714,7 +1697,6 @@ check_outliers.DHARMa <- check_outliers.performance_simres } - .check_outliers_cook <- function(x, threshold = NULL, ID.names = NULL) { @@ -1733,7 +1715,6 @@ check_outliers.DHARMa <- check_outliers.performance_simres } - .check_outliers_pareto <- function(x, threshold = 0.7) { insight::check_if_installed("loo") @@ -1753,7 +1734,6 @@ check_outliers.DHARMa <- check_outliers.performance_simres } - .check_outliers_mahalanobis <- function(x, threshold = stats::qchisq( p = 1 - 0.001, df = ncol(x) @@ -1783,7 +1763,6 @@ check_outliers.DHARMa <- check_outliers.performance_simres } - # Bigutils not yet fully available on CRAN .check_outliers_mahalanobis_robust <- function(x, threshold = stats::qchisq( @@ -1814,7 +1793,6 @@ check_outliers.DHARMa <- check_outliers.performance_simres } - .check_outliers_mcd <- function(x, threshold = stats::qchisq(p = 1 - 0.001, df = ncol(x)), percentage_central = 0.75, @@ -1858,7 +1836,6 @@ check_outliers.DHARMa <- check_outliers.performance_simres } - .check_outliers_ics <- function(x, threshold = 0.001, ID.names = NULL, @@ -1936,10 +1913,10 @@ check_outliers.DHARMa <- check_outliers.performance_simres } - .check_outliers_optics <- function(x, threshold = NULL, - ID.names = NULL) { + ID.names = NULL, + xi = 0.05) { out <- data.frame(Row = seq_len(nrow(x))) if (!is.null(ID.names)) { @@ -1950,7 +1927,7 @@ check_outliers.DHARMa <- check_outliers.performance_simres # Compute rez <- dbscan::optics(x, minPts = threshold) - rez <- dbscan::extractXi(rez, xi = 0.05) # TODO: find automatic way of setting xi + rez <- dbscan::extractXi(rez, xi = xi) # TODO: find automatic way of setting xi out$Distance_OPTICS <- rez$coredist @@ -2001,7 +1978,6 @@ check_outliers.DHARMa <- check_outliers.performance_simres # } - .check_outliers_lof <- function(x, threshold = 0.001, ID.names = NULL) { @@ -2035,7 +2011,6 @@ check_outliers.DHARMa <- check_outliers.performance_simres } - # Non-supported model classes --------------------------------------- #' @export diff --git a/R/check_overdispersion.R b/R/check_overdispersion.R index fc8e25e32..557fa0c9a 100644 --- a/R/check_overdispersion.R +++ b/R/check_overdispersion.R @@ -72,7 +72,6 @@ check_overdispersion <- function(x, ...) { } - # default ----------------------- #' @export @@ -84,7 +83,6 @@ check_overdispersion.default <- function(x, ...) { } - # Methods ----------------------------- #' @export @@ -150,7 +148,6 @@ print.check_overdisp <- function(x, digits = 3, ...) { } - # Overdispersion for classical models ----------------------------- #' @export diff --git a/R/check_predictions.R b/R/check_predictions.R index d5b382fd6..0780e35ec 100644 --- a/R/check_predictions.R +++ b/R/check_predictions.R @@ -263,7 +263,6 @@ check_predictions.lme <- function(object, ...) { } - # pp-check functions ------------------------------------- pp_check.lm <- function(object, @@ -420,7 +419,6 @@ pp_check.glmmTMB <- # styler: on - #' @rawNamespace #' S3method(bayesplot::pp_check, lm) #' S3method(bayesplot::pp_check, glm) diff --git a/R/check_singularity.R b/R/check_singularity.R index 86de4fcce..04965dc80 100644 --- a/R/check_singularity.R +++ b/R/check_singularity.R @@ -128,7 +128,6 @@ check_singularity <- function(x, tolerance = 1e-5, ...) { } - #' @export check_singularity.merMod <- function(x, tolerance = 1e-5, ...) { insight::check_if_installed("lme4") @@ -166,7 +165,6 @@ check_singularity.glmmTMB <- function(x, tolerance = 1e-5, ...) { check_singularity.glmmadmb <- check_singularity.glmmTMB - #' @export check_singularity.clmm <- function(x, tolerance = 1e-5, ...) { insight::check_if_installed("ordinal") @@ -176,7 +174,6 @@ check_singularity.clmm <- function(x, tolerance = 1e-5, ...) { } - #' @export check_singularity.cpglmm <- function(x, tolerance = 1e-5, ...) { insight::check_if_installed("cplm") @@ -185,14 +182,12 @@ check_singularity.cpglmm <- function(x, tolerance = 1e-5, ...) { } - #' @export check_singularity.MixMod <- function(x, tolerance = 1e-5, ...) { any(sapply(diag(x$D), function(.x) any(abs(.x) < tolerance))) } - #' @export check_singularity.lme <- function(x, tolerance = 1e-5, ...) { insight::check_if_installed("nlme") @@ -201,7 +196,6 @@ check_singularity.lme <- function(x, tolerance = 1e-5, ...) { } - #' @export check_singularity.default <- function(x, ...) { FALSE diff --git a/R/check_sphericity.R b/R/check_sphericity.R index 6df73d032..ed412c5f4 100644 --- a/R/check_sphericity.R +++ b/R/check_sphericity.R @@ -24,7 +24,6 @@ check_sphericity <- function(x, ...) { } - # default -------------------------- #' @export @@ -33,7 +32,6 @@ check_sphericity.default <- function(x, ...) { } - # methods ------------------------------ #' @export @@ -57,7 +55,6 @@ print.check_sphericity <- function(x, ...) { } - # other classes ------------------ #' @export diff --git a/R/compare_performance.R b/R/compare_performance.R index 2318d5e63..4ea6bc885 100644 --- a/R/compare_performance.R +++ b/R/compare_performance.R @@ -203,7 +203,6 @@ compare_performance <- function(..., metrics = "all", rank = FALSE, estimator = } - # methods ---------------------------- #' @export @@ -237,7 +236,6 @@ plot.compare_performance <- function(x, ...) { } - # utilities ------------------------------ .rank_performance_indices <- function(x, verbose) { diff --git a/R/cronbachs_alpha.R b/R/cronbachs_alpha.R index 735641ccc..64f583b1c 100644 --- a/R/cronbachs_alpha.R +++ b/R/cronbachs_alpha.R @@ -31,7 +31,6 @@ cronbachs_alpha <- function(x, ...) { } - #' @export cronbachs_alpha.data.frame <- function(x, verbose = TRUE, ...) { # remove missings @@ -50,14 +49,12 @@ cronbachs_alpha.data.frame <- function(x, verbose = TRUE, ...) { } - #' @export cronbachs_alpha.matrix <- function(x, verbose = TRUE, ...) { cronbachs_alpha(as.data.frame(x), verbose = verbose, ...) } - #' @export cronbachs_alpha.parameters_pca <- function(x, verbose = TRUE, ...) { # fetch data used for the PCA diff --git a/R/icc.R b/R/icc.R index 420893f1e..937e47568 100644 --- a/R/icc.R +++ b/R/icc.R @@ -327,7 +327,6 @@ icc <- function(model, } - #' @rdname icc #' @export variance_decomposition <- function(model, @@ -399,7 +398,6 @@ variance_decomposition <- function(model, } - # methods ------------------------------ #' @export @@ -544,7 +542,6 @@ print.icc_decomposed <- function(x, digits = 2, ...) { } - # helper ----------------- .compute_random_vars <- function(model, diff --git a/R/item_difficulty.R b/R/item_difficulty.R index ee8c652aa..756669860 100644 --- a/R/item_difficulty.R +++ b/R/item_difficulty.R @@ -82,7 +82,6 @@ item_difficulty <- function(x, maximum_value = NULL) { } - # methods -------------------------------------- #' @export diff --git a/R/looic.R b/R/looic.R index 4ded6ccd7..3114be360 100644 --- a/R/looic.R +++ b/R/looic.R @@ -69,7 +69,6 @@ looic <- function(model, verbose = TRUE) { } - # methods -------------------------- #' @export diff --git a/R/model_performance.R b/R/model_performance.R index 57edafdcb..a0bc1e4c4 100644 --- a/R/model_performance.R +++ b/R/model_performance.R @@ -41,7 +41,6 @@ model_performance <- function(model, ...) { performance <- model_performance - # methods -------------------------------- #' @export diff --git a/R/model_performance.bayesian.R b/R/model_performance.bayesian.R index db923fc26..1a2d6d509 100644 --- a/R/model_performance.bayesian.R +++ b/R/model_performance.bayesian.R @@ -281,7 +281,6 @@ model_performance.BFBayesFactor <- function(model, } - # helper ------------------- diff --git a/R/model_performance.lavaan.R b/R/model_performance.lavaan.R index 050dbd473..a349df9e9 100644 --- a/R/model_performance.lavaan.R +++ b/R/model_performance.lavaan.R @@ -149,7 +149,6 @@ model_performance.lavaan <- function(model, metrics = "all", verbose = TRUE, ... } - #' @export model_performance.blavaan <- function(model, metrics = "all", verbose = TRUE, ...) { insight::check_if_installed(c("lavaan", "blavaan")) diff --git a/R/model_performance.lm.R b/R/model_performance.lm.R index 0199b4ce4..592bc44bb 100644 --- a/R/model_performance.lm.R +++ b/R/model_performance.lm.R @@ -283,8 +283,6 @@ model_performance.nestedLogit <- function(model, metrics = "all", verbose = TRUE } - - # mfx models ------------------------------- #' @export @@ -320,9 +318,6 @@ model_performance.betamfx <- model_performance.logitor model_performance.model_fit <- model_performance.logitor - - - # other models ------------------------------- diff --git a/R/model_performance.mixed.R b/R/model_performance.mixed.R index a96786618..ba72a5a01 100644 --- a/R/model_performance.mixed.R +++ b/R/model_performance.mixed.R @@ -138,7 +138,6 @@ model_performance.mixed <- model_performance.merMod model_performance.glmmTMB <- model_performance.merMod - #' @export model_performance.mixor <- function(model, metrics = "all", diff --git a/R/model_performance_default.R b/R/model_performance_default.R index 51f8c3afd..2da3edab8 100644 --- a/R/model_performance_default.R +++ b/R/model_performance_default.R @@ -29,7 +29,6 @@ model_performance.default <- function(model, metrics = "all", verbose = TRUE, .. } - .check_bad_metrics <- function(metrics, all_metrics, verbose = TRUE) { bad_metrics <- which(!metrics %in% all_metrics) if (length(bad_metrics)) { diff --git a/R/performance_accuracy.R b/R/performance_accuracy.R index d0adb5554..f8ce5e6e5 100644 --- a/R/performance_accuracy.R +++ b/R/performance_accuracy.R @@ -234,7 +234,6 @@ print.performance_accuracy <- function(x, ...) { } - # utilities ------------------------ .crossv_kfold <- function(model_data, k = 5) { diff --git a/R/performance_aicc.R b/R/performance_aicc.R index 74041ac1b..443bf62c9 100644 --- a/R/performance_aicc.R +++ b/R/performance_aicc.R @@ -140,7 +140,6 @@ performance_aic.vgam <- function(x, ...) { performance_aic.vglm <- performance_aic.vgam - # Survey models -------------------------------------- #' @export @@ -153,7 +152,6 @@ performance_aic.svyglm <- function(x, ...) { performance_aic.svycoxph <- performance_aic.svyglm - # mfx models -------------------------------------- #' @export diff --git a/R/performance_hosmer.R b/R/performance_hosmer.R index 31c014db1..93780b527 100644 --- a/R/performance_hosmer.R +++ b/R/performance_hosmer.R @@ -63,7 +63,6 @@ performance_hosmer <- function(model, n_bins = 10) { } - # methods --------------------------------- #' @export diff --git a/R/performance_logloss.R b/R/performance_logloss.R index 3bbd14916..a8042d942 100644 --- a/R/performance_logloss.R +++ b/R/performance_logloss.R @@ -59,7 +59,6 @@ performance_logloss.brmsfit <- function(model, verbose = TRUE, ...) { } - # mfx models ------------------------------- #' @export diff --git a/R/performance_mae.R b/R/performance_mae.R index 0c7c4c978..2be53adfd 100644 --- a/R/performance_mae.R +++ b/R/performance_mae.R @@ -22,7 +22,6 @@ performance_mae <- function(model, ...) { mae <- performance_mae - #' @export performance_mae.default <- function(model, verbose = TRUE, ...) { .is_model_valid(model) @@ -34,8 +33,6 @@ performance_mae.default <- function(model, verbose = TRUE, ...) { } - - # mfx models ------------------------------- #' @export diff --git a/R/performance_mse.R b/R/performance_mse.R index 2dc5c2125..9d3c9d239 100644 --- a/R/performance_mse.R +++ b/R/performance_mse.R @@ -29,7 +29,6 @@ performance_mse <- function(model, ...) { mse <- performance_mse - #' @export performance_mse.default <- function(model, verbose = TRUE, ...) { res <- .safe(insight::get_residuals(model, verbose = verbose, type = "response", ...)) @@ -62,8 +61,6 @@ performance_mse.default <- function(model, verbose = TRUE, ...) { } - - # mfx models ------------------------------- #' @export diff --git a/R/performance_pcp.R b/R/performance_pcp.R index edcf4c724..a078cd7e2 100644 --- a/R/performance_pcp.R +++ b/R/performance_pcp.R @@ -84,7 +84,6 @@ performance_pcp <- function(model, } - # methods ---------------------------------- #' @export @@ -126,7 +125,6 @@ as.data.frame.performance_pcp <- function(x, row.names = NULL, ...) { } - # utilities -------------------------------------- .performance_pcp <- function(model, m0, ci, method, verbose = TRUE) { diff --git a/R/performance_roc.R b/R/performance_roc.R index 3b02d7b61..4de8be3e3 100644 --- a/R/performance_roc.R +++ b/R/performance_roc.R @@ -1,8 +1,21 @@ #' @title Simple ROC curve #' @name performance_roc #' -#' @description This function calculates a simple ROC curves of x/y coordinates -#' based on response and predictions of a binomial model. +#' @description +#' This function calculates a simple ROC curves of x/y coordinates based on +#' response and predictions of a binomial model. +#' +#' It returns the area under the curve (AUC) as a percentage, which corresponds +#' to the probability that a randomly chosen observation of "condition 1" is +#' correctly classified by the model as having a higher probability of being +#' "condition 1" than a randomly chosen "condition 2" observation. +#' +#' Applying `as.data.frame()` to the output returns a data frame containing the +#' following: +#' - `Sensitivity` (that actually corresponds to `1 - Specificity`): It is the +#' False Positive Rate. +#' - `Sensitivity`: It is the True Positive Rate, which is the proportion of +#' correctly classified "condition 1" observations. #' #' @param x A numeric vector, representing the outcome (0/1), or a model with #' binomial outcome. @@ -33,6 +46,7 @@ #' #' model <- glm(y ~ Sepal.Length + Sepal.Width, data = train_data, family = "binomial") #' as.data.frame(performance_roc(model, new_data = test_data)) +#' as.numeric(performance_roc(model)) #' #' roc <- performance_roc(model, new_data = test_data) #' area_under_curve(roc$Specificity, roc$Sensitivity) @@ -76,7 +90,6 @@ performance_roc <- function(x, ..., predictions, new_data) { } - # methods ----------------------------- #' @export @@ -109,6 +122,21 @@ print.performance_roc <- function(x, ...) { } +#' @export +as.double.performance_roc <- function(x, ...) { + if (length(unique(x$Model)) == 1) { + auc <- bayestestR::area_under_curve(x$Specificity, x$Sensitivity) + } else { + dat <- split(x, f = x$Model) + + auc <- numeric(length(dat)) + for (i in seq_along(dat)) { + auc[i] <- bayestestR::area_under_curve(dat[[i]]$Specificity, dat[[i]]$Sensitivity) + } + } + auc +} + # utilities --------------------------- @@ -130,7 +158,6 @@ print.performance_roc <- function(x, ...) { } - .performance_roc_model <- function(x, new_data, model_name = "Model 1") { predictions <- stats::predict(x, newdata = new_data, type = "response") if (is.null(new_data)) new_data <- insight::get_data(x, verbose = FALSE) @@ -148,7 +175,6 @@ print.performance_roc <- function(x, ...) { } - .performance_roc_models <- function(x, names) { l <- lapply(seq_along(x), function(i) { if (.valid_roc_models(x[[i]])) { @@ -161,12 +187,11 @@ print.performance_roc <- function(x, ...) { } - # add supported glm models here .valid_roc_models <- function(x) { if (inherits(x, "model_fit")) { x <- x$fit } - inherits(x, c("glm", "glmerMod", "logitor", "logitmfx", "probitmfx")) + inherits(x, c("glm", "glmerMod", "logitor", "logitmfx", "probitmfx", "glmmTMB")) } diff --git a/R/performance_score.R b/R/performance_score.R index 5ca3fdc84..ad9d1ce40 100644 --- a/R/performance_score.R +++ b/R/performance_score.R @@ -146,7 +146,6 @@ performance_score <- function(model, verbose = TRUE, ...) { } - # methods ----------------------------------- #' @export @@ -184,7 +183,6 @@ print.performance_score <- function(x, ...) { } - # utilities --------------------------------- .dispersion_parameter <- function(model, minfo) { @@ -203,7 +201,6 @@ print.performance_score <- function(x, ...) { } - .predict_score_y <- function(model) { pred <- NULL pred_zi <- NULL diff --git a/R/print-methods.R b/R/print-methods.R index 4b4a81aa7..299c006d4 100644 --- a/R/print-methods.R +++ b/R/print-methods.R @@ -52,7 +52,6 @@ print.r2_generic <- function(x, digits = 3, ...) { } - #' @export print.r2_pseudo <- function(x, digits = 3, ...) { model_type <- attr(x, "model_type") @@ -64,7 +63,6 @@ print.r2_pseudo <- function(x, digits = 3, ...) { } - #' @export print.r2_mlm <- function(x, digits = 3, ...) { model_type <- attr(x, "model_type") @@ -106,7 +104,6 @@ print.r2_mlm <- function(x, digits = 3, ...) { } - #' @export print.r2_bayes <- function(x, digits = 3, ...) { insight::print_color("# Bayesian R2 with Compatibility Interval\n\n", "blue") @@ -138,7 +135,6 @@ print.r2_bayes <- function(x, digits = 3, ...) { } - #' @export print.r2_loo <- function(x, digits = 3, ...) { insight::print_color("# LOO-adjusted R2 with Compatibility Interval\n\n", "blue") @@ -170,7 +166,6 @@ print.r2_loo <- function(x, digits = 3, ...) { } - #' @export print.r2_nakagawa_by_group <- function(x, digits = 3, ...) { insight::print_color("# Explained Variance by Level\n\n", "blue") diff --git a/R/r2.R b/R/r2.R index d62a0c477..a1be81087 100644 --- a/R/r2.R +++ b/R/r2.R @@ -327,7 +327,6 @@ r2.nestedLogit <- function(model, ci = NULL, verbose = TRUE, ...) { } - # mfx models --------------------- @@ -364,8 +363,6 @@ r2.betaor <- r2.logitmfx r2.model_fit <- r2.logitmfx - - # Cox & Snell R2 --------------------- @@ -384,8 +381,6 @@ r2.crch <- r2.BBreg r2.bayesx <- r2.BBreg - - # Nagelkerke R2 ---------------------- @@ -441,9 +436,6 @@ r2.mblogit <- function(model, ...) { } - - - # McFadden ---------------------- @@ -458,8 +450,6 @@ r2.multinom <- function(model, ...) { r2.mlogit <- r2.multinom - - # Zeroinflated R2 ------------------ @@ -516,6 +506,7 @@ r2.glmmTMB <- function(model, ci = NULL, tolerance = 1e-5, verbose = TRUE, ...) } # calculate r2 for non-mixed glmmTMB models here ------------------------- info <- insight::model_info(model, verbose = FALSE) + matrix_response <- grepl("cbind", insight::find_response(model), fixed = TRUE) if (info$is_linear) { # for linear models, use the manual calculation @@ -526,6 +517,17 @@ r2.glmmTMB <- function(model, ci = NULL, tolerance = 1e-5, verbose = TRUE, ...) attr(out, "model_type") <- "Logistic" names(out$R2_Tjur) <- "Tjur's R2" class(out) <- c("r2_pseudo", class(out)) + } else if (info$is_betabinomial) { + # currently, beta-binomial models without proportion response are not supported + if (matrix_response) { + if (verbose) { + insight::format_warning("Can't calculate accurate R2 for beta-binomial models with matrix-response formulation.") + } + out <- NULL + } else { + # betabinomial default to mcfadden, see pscl:::pR2Work + out <- r2_mcfadden(model) + } } else if (info$is_binomial && !info$is_bernoulli) { # currently, non-bernoulli binomial models are not supported if (verbose) { @@ -586,7 +588,6 @@ r2.wbm <- function(model, tolerance = 1e-5, ...) { } - #' @export r2.sem <- function(model, ...) { r2_conditional <- model$r2c @@ -605,7 +606,6 @@ r2.sem <- function(model, ...) { } - # Bayes R2 ------------------------ @@ -664,7 +664,6 @@ r2.rma <- function(model, ...) { } - #' @export r2.feis <- function(model, ...) { out <- list( @@ -677,7 +676,6 @@ r2.feis <- function(model, ...) { } - #' @export r2.fixest <- function(model, ...) { insight::check_if_installed("fixest") @@ -714,7 +712,6 @@ r2.fixest_multi <- function(model, ...) { } - #' @export r2.felm <- function(model, ...) { model_summary <- summary(model) @@ -728,8 +725,6 @@ r2.felm <- function(model, ...) { } - - #' @export r2.iv_robust <- function(model, ...) { out <- list( @@ -742,7 +737,6 @@ r2.iv_robust <- function(model, ...) { } - #' @export r2.ivreg <- function(model, ...) { model_summary <- summary(model) @@ -756,7 +750,6 @@ r2.ivreg <- function(model, ...) { } - #' @export r2.bigglm <- function(model, ...) { out <- list(R2_CoxSnell = summary(model)$rsq) @@ -766,7 +759,6 @@ r2.bigglm <- function(model, ...) { } - #' @export r2.biglm <- function(model, ...) { df.int <- as.numeric(insight::has_intercept(model)) @@ -788,7 +780,6 @@ r2.biglm <- function(model, ...) { } - #' @export r2.lmrob <- function(model, ...) { model_summary <- summary(model) @@ -805,7 +796,6 @@ r2.lmrob <- function(model, ...) { r2.complmrob <- r2.lmrob - #' @export r2.mmclogit <- function(model, ...) { list(R2 = NA) @@ -822,7 +812,6 @@ r2.Arima <- function(model, ...) { } - #' @export r2.plm <- function(model, ...) { model_summary <- summary(model) @@ -836,7 +825,6 @@ r2.plm <- function(model, ...) { } - #' @export r2.selection <- function(model, ...) { model_summary <- summary(model) @@ -853,7 +841,6 @@ r2.selection <- function(model, ...) { } - #' @export r2.svyglm <- function(model, ...) { rsq <- (model$null.deviance - model$deviance) / model$null.deviance @@ -869,7 +856,6 @@ r2.svyglm <- function(model, ...) { } - #' @export r2.vglm <- function(model, ...) { out <- list(R2_McKelvey = r2_mckelvey(model)) @@ -882,7 +868,6 @@ r2.vglm <- function(model, ...) { r2.vgam <- r2.vglm - #' @export r2.DirichletRegModel <- function(model, ...) { out <- list(R2_Nagelkerke = r2_nagelkerke(model)) @@ -892,9 +877,6 @@ r2.DirichletRegModel <- function(model, ...) { } - - - # helper ------------------- .check_r2_ci_args <- function(ci = NULL, ci_method = "bootstrap", valid_ci_method = NULL, verbose = TRUE) { diff --git a/R/r2_bayes.R b/R/r2_bayes.R index 4831a4bd2..c12319a0e 100644 --- a/R/r2_bayes.R +++ b/R/r2_bayes.R @@ -303,7 +303,6 @@ r2_posterior.BFBayesFactor <- function(model, } - #' @keywords internal .r2_posterior_model_average <- function(model, prior_odds = NULL, verbose = TRUE) { insight::check_if_installed("BayesFactor") @@ -361,7 +360,6 @@ r2_posterior.BFBayesFactor <- function(model, } - #' @export as.data.frame.r2_bayes <- function(x, ...) { out <- data.frame( @@ -395,7 +393,6 @@ as.data.frame.r2_bayes <- function(x, ...) { } - # Utils ------------------------------------------------------------------- .get_bfbf_predictions <- function(model, iterations = 4000, verbose = TRUE) { diff --git a/R/r2_coxsnell.R b/R/r2_coxsnell.R index 36df504ec..ed603409b 100644 --- a/R/r2_coxsnell.R +++ b/R/r2_coxsnell.R @@ -59,7 +59,6 @@ r2_coxsnell <- function(model, ...) { } - # r2-coxsnell based on model information --------------------------- @@ -69,13 +68,22 @@ r2_coxsnell.glm <- function(model, verbose = TRUE, ...) { if (is.null(info)) { info <- suppressWarnings(insight::model_info(model, verbose = FALSE)) } + matrix_response <- grepl("cbind", insight::find_response(model), fixed = TRUE) + # Cox & Snell's R2 is not defined for binomial models that are not Bernoulli models - if (info$is_binomial && !info$is_bernoulli && class(model)[1] == "glm") { + if (info$is_binomial && !info$is_betabinomial && !info$is_bernoulli && class(model)[1] %in% c("glm", "glmmTMB")) { if (verbose) { insight::format_alert("Can't calculate accurate R2 for binomial models that are not Bernoulli models.") } return(NULL) } + # currently, beta-binomial models without proportion response are not supported + if (info$is_betabinomial && matrix_response) { + if (verbose) { + insight::format_warning("Can't calculate accurate R2 for beta-binomial models with matrix-response formulation.") + } + return(NULL) + } # if no deviance, return NULL if (is.null(model$deviance)) { return(NULL) @@ -96,7 +104,7 @@ r2_coxsnell.glmmTMB <- function(model, verbose = TRUE, ...) { info <- suppressWarnings(insight::model_info(model, verbose = FALSE)) } # Cox & Snell's R2 is not defined for binomial models that are not Bernoulli models - if (info$is_binomial && !info$is_bernoulli) { + if (info$is_binomial && !info$is_bernoulli && !info$is_betabinomial) { if (verbose) { insight::format_alert("Can't calculate accurate R2 for binomial models that are not Bernoulli models.") } @@ -159,7 +167,6 @@ r2_coxsnell.bife <- function(model, ...) { } - # mfx models --------------------- @@ -187,8 +194,6 @@ r2_coxsnell.negbinirr <- r2_coxsnell.logitmfx r2_coxsnell.negbinmfx <- r2_coxsnell.logitmfx - - # r2-coxsnell based on loglik stored in model object --------------------------- @@ -208,7 +213,6 @@ r2_coxsnell.svycoxph <- function(model, ...) { } - # r2-coxsnell based on loglik of null-model (update) --------------------------- diff --git a/R/r2_loo.R b/R/r2_loo.R index dbcaf38b3..4aaa8dcab 100644 --- a/R/r2_loo.R +++ b/R/r2_loo.R @@ -166,7 +166,6 @@ r2_loo_posterior.BFBayesFactor <- function(model, verbose = TRUE, ...) { } - #' @export as.data.frame.r2_loo <- function(x, ...) { out <- data.frame( diff --git a/R/r2_mcfadden.R b/R/r2_mcfadden.R index de61fac87..64f430caa 100644 --- a/R/r2_mcfadden.R +++ b/R/r2_mcfadden.R @@ -29,7 +29,6 @@ r2_mcfadden <- function(model, ...) { } - # helper ----------------------- @@ -49,11 +48,6 @@ r2_mcfadden <- function(model, ...) { } - - - - - # r2 via loglik and update -------------------------- @@ -63,13 +57,21 @@ r2_mcfadden.glm <- function(model, verbose = TRUE, ...) { if (is.null(info)) { info <- suppressWarnings(insight::model_info(model, verbose = FALSE)) } + matrix_response <- grepl("cbind", insight::find_response(model), fixed = TRUE) - if (info$is_binomial && !info$is_bernoulli && class(model)[1] == "glm") { + if (info$is_binomial && !info$is_betabinomial && !info$is_bernoulli && class(model)[1] %in% c("glm", "glmmTMB")) { if (verbose) { insight::format_warning("Can't calculate accurate R2 for binomial models that are not Bernoulli models.") } return(NULL) } + # currently, beta-binomial models without proportion response are not supported + if (info$is_betabinomial && matrix_response) { + if (verbose) { + insight::format_warning("Can't calculate accurate R2 for beta-binomial models with matrix-response formulation.") + } + return(NULL) + } l_null <- insight::get_loglikelihood(stats::update(model, ~1)) .r2_mcfadden(model, l_null) @@ -99,6 +101,9 @@ r2_mcfadden.brmultinom <- r2_mcfadden.glm #' @export r2_mcfadden.censReg <- r2_mcfadden.glm +#' @export +r2_mcfadden.glmmTMB <- r2_mcfadden.glm + #' @export r2_mcfadden.truncreg <- r2_mcfadden.glm @@ -121,9 +126,6 @@ r2_mcfadden.mblogit <- function(model, ...) { } - - - # mfx models --------------------- @@ -151,7 +153,6 @@ r2_mcfadden.probitmfx <- r2_mcfadden.logitmfx r2_mcfadden.negbinmfx <- r2_mcfadden.logitmfx - # special models ------------------------------------------- @@ -173,7 +174,6 @@ r2_mcfadden.clm2 <- function(model, ...) { } - #' @export r2_mcfadden.multinom <- function(model, ...) { l_null <- insight::get_loglikelihood(stats::update(model, ~1, trace = FALSE)) diff --git a/R/r2_nagelkerke.R b/R/r2_nagelkerke.R index 9b5b40f8f..4c64ca668 100644 --- a/R/r2_nagelkerke.R +++ b/R/r2_nagelkerke.R @@ -22,8 +22,6 @@ r2_nagelkerke <- function(model, ...) { } - - # helper --------------------------- @@ -47,9 +45,6 @@ r2_nagelkerke <- function(model, ...) { } - - - # Nagelkerke's R2 based on Cox&Snell's R2 ---------------- #' @export @@ -143,7 +138,6 @@ r2_nagelkerke.bife <- function(model, ...) { } - # mfx models --------------------- @@ -171,8 +165,6 @@ r2_nagelkerke.negbinirr <- r2_nagelkerke.logitmfx r2_nagelkerke.negbinmfx <- r2_nagelkerke.logitmfx - - # Nagelkerke's R2 based on LogLik ---------------- @@ -226,7 +218,6 @@ r2_nagelkerke.truncreg <- r2_nagelkerke.clm r2_nagelkerke.DirichletRegModel <- r2_coxsnell.clm - # Nagelkerke's R2 based on LogLik stored in model object ---------------- diff --git a/R/r2_nakagawa.R b/R/r2_nakagawa.R index aeb4df1c1..03fcb092f 100644 --- a/R/r2_nakagawa.R +++ b/R/r2_nakagawa.R @@ -235,7 +235,6 @@ r2_nakagawa <- function(model, } - # methods ------ #' @export @@ -282,7 +281,6 @@ print.r2_nakagawa <- function(x, digits = 3, ...) { } - # bootstrapping -------------- # bootstrapping using package "boot" diff --git a/R/test_bf.R b/R/test_bf.R index 38b0e421d..658883c76 100644 --- a/R/test_bf.R +++ b/R/test_bf.R @@ -30,7 +30,6 @@ test_bf.default <- function(..., reference = 1, text_length = NULL) { } - #' @export test_bf.ListModels <- function(objects, reference = 1, text_length = NULL, ...) { if (.test_bf_areAllBayesian(objects) == "mixed") { @@ -79,7 +78,6 @@ test_bf.ListModels <- function(objects, reference = 1, text_length = NULL, ...) } - # Helpers ----------------------------------------------------------------- .test_bf_areAllBayesian <- function(objects) { diff --git a/R/test_likelihoodratio.R b/R/test_likelihoodratio.R index 21ae2694d..20a58f6fb 100644 --- a/R/test_likelihoodratio.R +++ b/R/test_likelihoodratio.R @@ -16,7 +16,6 @@ test_likelihoodratio <- function(..., estimator = "ML", verbose = TRUE) { test_lrt <- test_likelihoodratio - # default -------------------- #' @export @@ -55,7 +54,6 @@ test_likelihoodratio.default <- function(..., estimator = "OLS", verbose = TRUE) } - # methods ------------------------------ #' @export diff --git a/R/test_performance.R b/R/test_performance.R index f818b02dd..76877bb0b 100644 --- a/R/test_performance.R +++ b/R/test_performance.R @@ -230,7 +230,6 @@ test_performance <- function(..., reference = 1, verbose = TRUE) { } - # default -------------------------------- #' @export @@ -253,7 +252,6 @@ test_performance.default <- function(..., reference = 1, include_formula = FALSE } - # methods ------------------------------ #' @export @@ -315,7 +313,6 @@ display.test_performance <- function(object, format = "markdown", digits = 2, .. } - # other classes ----------------------------------- #' @export @@ -406,7 +403,6 @@ test_performance.ListNonNestedRegressions <- function(objects, } - # TESTS IMPLEMENTED IN OTHER PACKAGES # # Non-nested @@ -416,7 +412,6 @@ test_performance.ListNonNestedRegressions <- function(objects, # nonnest2::icci(m2, m3) - # Helpers ----------------------------------------------------------------- @@ -432,7 +427,6 @@ test_performance.ListNonNestedRegressions <- function(objects, } - .test_performance_checks <- function(objects, multiple = TRUE, same_response = TRUE, verbose = TRUE) { # TODO: we could actually generate a baseline model 'y ~ 1' whenever a single model is passed if (multiple && insight::is_model(objects)) { @@ -472,7 +466,6 @@ test_performance.ListNonNestedRegressions <- function(objects, } - .check_objectnames <- function(objects, dot_names) { # Replace with names from the global environment, if these are not yet properly set object_names <- insight::compact_character(names(objects)) diff --git a/R/test_vuong.R b/R/test_vuong.R index f0cce6d16..26c3d70dc 100644 --- a/R/test_vuong.R +++ b/R/test_vuong.R @@ -39,9 +39,6 @@ test_vuong.ListNonNestedRegressions <- function(objects, reference = 1, ...) { } - - - # ------------------------------------------------------------------------- # Utils ------------------------------------------------------------------- # ------------------------------------------------------------------------- @@ -185,10 +182,6 @@ test_vuong.ListNonNestedRegressions <- function(objects, reference = 1, ...) { } - - - - # Compute lambda (Eq 3.6) ------------------------------------------------- # ------------------------------------------------------------------------- @@ -224,9 +217,6 @@ test_vuong.ListNonNestedRegressions <- function(objects, reference = 1, ...) { } - - - # Compute AB (Eq 2.1 and 2.2) --------------------------------------------- # ------------------------------------------------------------------------- diff --git a/R/test_wald.R b/R/test_wald.R index ddfe21829..6873cf72c 100644 --- a/R/test_wald.R +++ b/R/test_wald.R @@ -25,7 +25,6 @@ test_wald.default <- function(..., verbose = TRUE) { } - #' @export test_wald.ListNestedRegressions <- function(objects, verbose = TRUE, ...) { # for binomial models, only chisq-test diff --git a/man/check_collinearity.Rd b/man/check_collinearity.Rd index 847ff110d..699695a06 100644 --- a/man/check_collinearity.Rd +++ b/man/check_collinearity.Rd @@ -111,7 +111,14 @@ If interaction terms are included in a model, high VIF values are expected. This portion of multicollinearity among the component terms of an interaction is also called "inessential ill-conditioning", which leads to inflated VIF values that are typically seen for models with interaction -terms \emph{(Francoeur 2013)}. +terms \emph{(Francoeur 2013)}. Centering interaction terms can resolve this +issue \emph{(Kim and Jung 2024)}. +} + +\section{Multicollinearity and Polynomial Terms}{ + +Polynomial transformations are considered a single term and thus VIFs are +not calculated between them. } \section{Concurvity for Smooth Terms in Generalized Additive Models}{ @@ -144,9 +151,12 @@ plot(x) \item Francoeur, R. B. (2013). Could Sequential Residual Centering Resolve Low Sensitivity in Moderated Regression? Simulations and Cancer Symptom Clusters. Open Journal of Statistics, 03(06), 24-44. -\item James, G., Witten, D., Hastie, T., and Tibshirani, R. (eds.). (2013). -An introduction to statistical learning: with applications in R. New York: +\item James, G., Witten, D., Hastie, T., and Tibshirani, R. (eds.). (2013). An +introduction to statistical learning: with applications in R. New York: Springer. +\item Kim, Y., & Jung, G. (2024). Understanding linear interaction analysis with +causal graphs. British Journal of Mathematical and Statistical Psychology, +00, 1–14. \item Marcoulides, K. M., and Raykov, T. (2019). Evaluation of Variance Inflation Factors in Regression Models Using Latent Variable Modeling Methods. Educational and Psychological Measurement, 79(5), 874–882. diff --git a/man/check_outliers.Rd b/man/check_outliers.Rd index 623eae4b2..489dbafc3 100644 --- a/man/check_outliers.Rd +++ b/man/check_outliers.Rd @@ -236,7 +236,8 @@ detect several outliers (as these are usually defined as a percentage of extreme values), this algorithm functions in a different manner and won't always detect outliers. Note that \code{method = "optics"} requires the \strong{dbscan} package to be installed, and that it takes some time to compute -the results. +the results. Additionally, the \code{optics_xi} (default to 0.05) is passed to +the \code{\link[dbscan:optics]{dbscan::extractXi()}} function to further refine the cluster selection. \item \strong{Local Outlier Factor}: Based on a K nearest neighbors algorithm, LOF compares the local density of a point to the local densities of its neighbors instead of computing a @@ -283,6 +284,7 @@ Default thresholds are currently specified as follows: mcd = stats::qchisq(p = 1 - 0.001, df = ncol(x)), ics = 0.001, optics = 2 * ncol(x), + optics_xi = 0.05, lof = 0.001 ) }\if{html}{\out{}} diff --git a/man/performance_roc.Rd b/man/performance_roc.Rd index e14ef04c9..48b473a42 100644 --- a/man/performance_roc.Rd +++ b/man/performance_roc.Rd @@ -26,8 +26,22 @@ curve (\code{Sensitivity} and \code{Specificity}), and a column with the model name. } \description{ -This function calculates a simple ROC curves of x/y coordinates -based on response and predictions of a binomial model. +This function calculates a simple ROC curves of x/y coordinates based on +response and predictions of a binomial model. + +It returns the area under the curve (AUC) as a percentage, which corresponds +to the probability that a randomly chosen observation of "condition 1" is +correctly classified by the model as having a higher probability of being +"condition 1" than a randomly chosen "condition 2" observation. + +Applying \code{as.data.frame()} to the output returns a data frame containing the +following: +\itemize{ +\item \code{Sensitivity} (that actually corresponds to \code{1 - Specificity}): It is the +False Positive Rate. +\item \code{Sensitivity}: It is the True Positive Rate, which is the proportion of +correctly classified "condition 1" observations. +} } \note{ There is also a \href{https://easystats.github.io/see/articles/performance.html}{\code{plot()}-method} @@ -45,6 +59,7 @@ train_data <- iris[-folds, ] model <- glm(y ~ Sepal.Length + Sepal.Width, data = train_data, family = "binomial") as.data.frame(performance_roc(model, new_data = test_data)) +as.numeric(performance_roc(model)) roc <- performance_roc(model, new_data = test_data) area_under_curve(roc$Specificity, roc$Sensitivity) diff --git a/performance.Rproj b/performance.Rproj index 5a22bfb25..2c52ed395 100644 --- a/performance.Rproj +++ b/performance.Rproj @@ -1,4 +1,5 @@ Version: 1.0 +ProjectId: af6facf3-033e-40d4-ac22-2830774814a9 RestoreWorkspace: No SaveWorkspace: No diff --git a/tests/testthat/test-check_collinearity.R b/tests/testthat/test-check_collinearity.R index d62aec75d..e01bcda39 100644 --- a/tests/testthat/test-check_collinearity.R +++ b/tests/testthat/test-check_collinearity.R @@ -23,6 +23,12 @@ test_that("check_collinearity, correct order in print", { }) +test_that("check_collinearity, interaction", { + m <- lm(mpg ~ wt * cyl, data = mtcars) + expect_message(check_collinearity(m), regex = "Model has interaction terms") +}) + + test_that("check_collinearity", { skip_if_not_installed("glmmTMB") skip_if_not(getRversion() >= "4.0.0") @@ -48,7 +54,6 @@ test_that("check_collinearity", { }) - test_that("check_collinearity", { skip_if_not_installed("glmmTMB") skip_if_not(getRversion() >= "4.0.0") @@ -216,5 +221,19 @@ test_that("check_collinearity, hurdle/zi models w/o zi-formula", { test_that("check_collinearity, invalid data", { dd <- data.frame(y = as.difftime(0:5, units = "days")) m1 <- lm(y ~ 1, data = dd) - expect_error(check_collinearity(m1), "Can't extract variance-covariance matrix") + expect_message(check_collinearity(m1), "Could not extract the variance-covariance") +}) + +test_that("check_collinearity, glmmTMB hurdle w/o zi", { + skip_if_not_installed("glmmTMB") + data(Salamanders, package = "glmmTMB") + mod_trunc_error <- glmmTMB::glmmTMB( + count ~ spp + mined + (1 | site), + data = Salamanders[Salamanders$count > 0, , drop = FALSE], + family = glmmTMB::truncated_nbinom2(), + ziformula = ~0, + dispformula = ~1 + ) + out <- check_collinearity(mod_trunc_error) + expect_equal(out$VIF, c(1.03414, 1.03414), tolerance = 1e-3) }) diff --git a/tests/testthat/test-check_model.R b/tests/testthat/test-check_model.R index 2b1141d26..216062a08 100644 --- a/tests/testthat/test-check_model.R +++ b/tests/testthat/test-check_model.R @@ -40,7 +40,7 @@ test_that("`check_outliers()` works if convergence issues", { test_that("`check_model()` for invalid models", { dd <- data.frame(y = as.difftime(0:5, units = "days")) m1 <- lm(y ~ 1, data = dd) - expect_error(check_model(m1)) + expect_message(check_model(m1), regex = "Date variables are not") }) test_that("`check_model()` works for quantreg", { diff --git a/tests/testthat/test-check_outliers.R b/tests/testthat/test-check_outliers.R index 6aa64516f..db31e000d 100644 --- a/tests/testthat/test-check_outliers.R +++ b/tests/testthat/test-check_outliers.R @@ -361,6 +361,16 @@ test_that("check_outliers with invald data", { }) +test_that("check_outliers on numeric data only", { + data(mtcars) + # all predictors categorical + mtcars$wt <- as.factor(mtcars$wt) + mtcars$mpg <- as.factor(mtcars$mpg) + model <- glm(vs ~ wt + mpg, data = mtcars, family = "binomial") + expect_error(check_outliers(model), regex = "No numeric") +}) + + test_that("check_outliers with DHARMa", { skip_if_not_installed("DHARMa") mt1 <- mtcars[, c(1, 3, 4)] @@ -396,3 +406,10 @@ test_that("check_outliers with DHARMa", { ) ) }) + + +test_that("check_outliers with DHARMa", { + data(mtcars) + out <- check_outliers(mtcars$mpg, method = "zscore", threshold = 2) + expect_equal(which(as.numeric(out) == 1), c(18, 20)) +}) diff --git a/tests/testthat/test-performance_roc.R b/tests/testthat/test-performance_roc.R index b13670b89..4a02042fa 100644 --- a/tests/testthat/test-performance_roc.R +++ b/tests/testthat/test-performance_roc.R @@ -1,3 +1,5 @@ +skip_if_not_installed("bayestestR") + test_that("performance_roc", { skip_if_not_installed("lme4") m <- lme4::glmer(vs ~ mpg + (1 | gear), family = "binomial", data = mtcars) @@ -14,6 +16,7 @@ test_that("performance_roc", { ) }) + test_that("performance_roc", { set.seed(123) d <- iris[sample(1:nrow(iris), size = 50), ] @@ -40,3 +43,23 @@ test_that("performance_roc", { tolerance = 1e-3 ) }) + + +test_that("performance_roc, as.numeric", { + data(iris) + set.seed(123) + iris$y <- rbinom(nrow(iris), size = 1, .3) + folds <- sample(nrow(iris), size = nrow(iris) / 8, replace = FALSE) + test_data <- iris[folds, ] + train_data <- iris[-folds, ] + + model <- glm(y ~ Sepal.Length + Sepal.Width, data = train_data, family = "binomial") + roc <- performance_roc(model) + out <- as.numeric(roc) + expect_equal( + out, + bayestestR::area_under_curve(roc$Specificity, roc$Sensitivity), + tolerance = 1e-4, + ignore_attr = TRUE + ) +}) diff --git a/tests/testthat/test-pkg-fixest.R b/tests/testthat/test-pkg-fixest.R index 8a187e33b..b6c7e93b0 100644 --- a/tests/testthat/test-pkg-fixest.R +++ b/tests/testthat/test-pkg-fixest.R @@ -48,7 +48,6 @@ test_that("fixest: model_performance", { }) - test_that("fixest_multi: r2", { skip_if_not_installed("fixest") fixest::setFixest_nthreads(1) diff --git a/tests/testthat/test-r2_mcfadden.R b/tests/testthat/test-r2_mcfadden.R index ad64e92dd..354f6884e 100644 --- a/tests/testthat/test-r2_mcfadden.R +++ b/tests/testthat/test-r2_mcfadden.R @@ -26,3 +26,45 @@ test_that("r2_mcfadden", { } ) }) + +skip_if_not_installed("withr") + +withr::with_environment( + new.env(), + { + test_that("r2_mcfadden, glmmTMB-beta-binomial", { + skip_if_not_installed("glmmTMB") + set.seed(101) + dd <- data.frame(x = rnorm(200)) + dd$y <- glmmTMB::simulate_new( + ~ 1 + x, + newdata = dd, + newparams = list(beta = c(0, 1), betadisp = -1), + weights = rep(10, nrow(dd)), + family = glmmTMB::betabinomial() + )[[1]] + dd$success <- round(runif(nrow(dd), 0, dd$y)) + d <<- dd + + m <- glmmTMB::glmmTMB( + y / 10 ~ 1 + x, + data = d, + weights = rep(10, nrow(d)), + family = glmmTMB::betabinomial() + ) + out1 <- r2(m) + out2 <- r2_mcfadden(m) + expect_equal(out1$R2, out2$R2, tolerance = 1e-4, ignore_attr = TRUE) + expect_equal(out1$R2, 0.06892733, tolerance = 1e-4, ignore_attr = TRUE) + + m <- glmmTMB::glmmTMB( + cbind(y, success) ~ 1 + x, + data = d, + weights = rep(10, nrow(d)), + family = glmmTMB::betabinomial() + ) + expect_warning(r2(m), regex = "calculate accurate") + expect_warning(r2_mcfadden(m), regex = "calculate accurate") + }) + } +) diff --git a/vignettes/check_model.Rmd b/vignettes/check_model.Rmd index 1bd8e6eee..3e52ef797 100644 --- a/vignettes/check_model.Rmd +++ b/vignettes/check_model.Rmd @@ -1,6 +1,6 @@ --- title: "Checking model assumption - linear models" -output: +output: rmarkdown::html_vignette: toc: true tags: [r, performance, r2] @@ -8,7 +8,7 @@ vignette: > \usepackage[utf8]{inputenc} %\VignetteIndexEntry{Checking model assumption - linear models} %\VignetteEngine{knitr::rmarkdown} -editor_options: +editor_options: chunk_output_type: console --- @@ -214,6 +214,8 @@ There are several ways to address heteroscedasticity. 3. Transforming the response variable, for instance, taking the `log()`, may also help to avoid issues with heteroscedasticity. +4. Weighting observations is another remedy against heteroscedasticity, in particular the method of [weighted least squares](https://online.stat.psu.edu/stat501/lesson/13/13.1). + ## Influential observations - outliers Outliers can be defined as particularly influential observations, and this plot helps detecting those outliers. Cook's distance (_Cook 1977_, _Cook & Weisberg 1982_) is used to define outliers, i.e. any point in this plot that falls outside of Cook's distance (the dashed lines) is considered an influential observation. @@ -248,7 +250,7 @@ Our model clearly suffers from multicollinearity, as all predictors have high VI ### How to fix this? -Usually, predictors with (very) high VIF values should be removed from the model to fix multicollinearity. Some caution is needed for interaction terms. If interaction terms are included in a model, high VIF values are expected. This portion of multicollinearity among the component terms of an interaction is also called "inessential ill-conditioning", which leads to inflated VIF values that are typically seen for models with interaction terms _(Francoeur 2013)_. In such cases, re-fit your model without interaction terms and check this model for collinearity among predictors. +Usually, predictors with (very) high VIF values should be removed from the model to fix multicollinearity. Some caution is needed for interaction terms. If interaction terms are included in a model, high VIF values are expected. This portion of multicollinearity among the component terms of an interaction is also called "inessential ill-conditioning", which leads to inflated VIF values that are typically seen for models with interaction terms _(Francoeur 2013)_. In such cases, try centering the involved interaction terms, which can reduce multicollinearity _(Kim and Jung 2024)_, or re-fit your model without interaction terms and check this model for collinearity among predictors. ## Normality of residuals @@ -291,6 +293,8 @@ Gelman A, and Hill J. Data analysis using regression and multilevel/hierarchical James, G., Witten, D., Hastie, T., and Tibshirani, R. (eds.).An introduction to statistical learning: with applications in R. New York: Springer, 2013 +Kim, Y., & Jung, G. (2024). Understanding linear interaction analysis with causal graphs. British Journal of Mathematical and Statistical Psychology, 00, 1–14. + Leys C, Delacre M, Mora YL, Lakens D, Ley C. How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre-Registration. International Review of Social Psychology, 2019 McElreath, R. Statistical rethinking: A Bayesian course with examples in R and Stan. 2nd edition. Chapman and Hall/CRC, 2020