Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The interpretation of the diagnostic plot #29

Open
nuferates opened this issue Jul 14, 2023 · 4 comments
Open

The interpretation of the diagnostic plot #29

nuferates opened this issue Jul 14, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@nuferates
Copy link
Collaborator

The first outcome of the ROBMED results is the diagnostic plot. We already explained how to make sense of it in our ORM piece, but I think we should not assume that readers will be interested enough to explore how it is used. So can we add a question mark icon next to the diagnostic plot title and when the users hover their mouse over they get to see a generic explanation of how it is interpreted? It could even be nicer if we can generate a specific interpretation for each graph underneath (can AI help?), but it might be well beyond the scope of this project.

@aalfons aalfons transferred this issue from aalfons/robmed Jul 17, 2023
@aalfons
Copy link
Owner

aalfons commented Sep 25, 2023

Indeed, it would be extremely useful if there is some help that can be displayed. This is not just relevant for the diagnostic plot, but it may also be useful for other things.

Focusing on the diagnostic plot in this issue, we have some plots from various presentations that show what the plot looks like when there are no deviations from normality, when we have a heavily tailed distribution, and when we have left- or right skewness:

diagnostics-normal.pdf
diagnostics-heavy-tails.pdf
diagnostics-left-skewed.pdf
diagnostics-right-skewed.pdf

Those plots are generated with the following script:

# --------------------------------------
# Author: Andreas Alfons
#         Erasmus Universiteit Rotterdam
# --------------------------------------

# load packages
library("robmed")
library("scales")
library("sn")


## control parameters for simulation
seed <- 20210617  # seed for the random number generator
n <- 200          # number of observations
K <- 1000         # number of simulation runs
R <- 5000         # number of bootstrap samples
# coefficients in mediation model
a <- b <- c <- 0.4
# error scales in mediation model
sigma_m <- sqrt(1 - a^2)
sigma_y <- sqrt(1 - b^2 - c^2 - 2*a*b*c)

## parameter combinations and fancy labels for plots
alpha <- c(0, 0, -Inf, Inf)
nu <- c(Inf, 2, Inf, Inf)
labels <- c("normal", "heavy-tails", "left-skewed", "right-skewed")

## control parameters for methods
level <- 0.95                                        # confidence level
lmrob_control <- reg_control(max_iterations = 5000)  # MM-estimator

# function to compute mean of skew-t distribution
get_mean <- function(ksi = 0, omega = 1, alpha = 0, nu = Inf) {
  if (is.infinite(alpha)) delta <- sign(alpha)
  else delta <- alpha / sqrt(1 + alpha^2)
  if (is.infinite(nu)) b_nu <- sqrt(2 / pi)
  else b_nu <- sqrt(nu) * beta((nu-1)/2, 0.5) / (sqrt(pi) * gamma(0.5))
  # compute mean
  ksi + omega * delta * b_nu
}

## set seed of the random number generator for reproducibility
set.seed(seed)

## generate independent variable and error terms such that results are
## comparable across different values of the parameters
x <- rnorm(n)
# First uniformly distributed values are drawn, which are later transformed
# to the given error distribution by applying the inverse CDF.
u_m <- runif(n)
u_y <- runif(n)

## loop over parameter settings
for (i in 1:length(labels)) {

  # transform errors
  if (is.infinite(nu[i])) {
    # normal or skew-normal distribution
    if (is.finite(alpha[i]) && alpha[i] == 0) {
      # normal errors
      e_m <- qnorm(u_m)
      e_y <- qnorm(u_y)
    } else {
      # skew-normal errors
      e_m <- qsn(u_m, alpha = alpha[i])
      e_y <- qsn(u_y, alpha = alpha[i])
    }
  } else {
    # t or skew-t distribution
    if (is.finite(alpha[i]) && alpha[i] == 0) {
      # t-distributed errors
      e_m <- qt(u_m, df = nu[i])
      e_y <- qt(u_y, df = nu[i])
    } else {
      # skew-t distributed errors
      e_m <- qst(u_m, alpha = alpha[i], nu = nu[i])
      e_y <- qst(u_y, alpha = alpha[i], nu = nu[i])
    }
  }

  # compute mean of error terms such that it can be subtracted
  mu_m <- get_mean(omega = sigma_m, alpha = alpha[i], nu = nu[i])
  mu_y <- get_mean(omega = sigma_y, alpha = alpha[i], nu = nu[i])

  # transform error terms (scale and center)
  e_m_transformed <- sigma_m * e_m - mu_m
  e_y_transformed <- sigma_y * e_y - mu_y

  # generate hypothesized mediator and dependent variable
  m <- a * x + e_m_transformed
  y <- b * m + c * x + e_y_transformed
  simulated_data <- data.frame(x, y, m)

  # fit mediation model
  robust_fit <- fit_mediation(simulated_data, "x", "y", "m",
                              control = lmrob_control)

  # create plot
  p <- weight_plot(robust_fit) +
    scale_color_manual("", values = c("black", "#00BFC4"))

  # save plot to file
  pdf(file = sprintf("figures/diagnostics-%s.pdf", labels[i]),
      width = 6, height = 4.25)
  print(p)
  dev.off()

}

@aalfons
Copy link
Owner

aalfons commented Sep 25, 2023

I don't want to attempt anything regarding AI generated interpretations of a given plot. Users might put that into papers without checking, which causes ethical issues regarding authorship and may reflect negatively on our software. I'm in favor of humans creating their own interpretations.

It would be our job to help them by having a clear explanation. Another suggestion would be to add confidence bands to make it clear if deviations from the expected line are significant, see this issue for package robmed: aalfons/robmed#30

@aalfons
Copy link
Owner

aalfons commented Sep 25, 2023

@AuroreAA, do you have suggestions on how to implement such help functionality?

@aalfons aalfons added the enhancement New feature or request label Sep 25, 2023
@AuroreAA
Copy link
Collaborator

I can think about different solutions:

  • either just a tooltip can be displayed on hover from the text and also link to another pdf file which explains everything in detail.
  • we can also convert the plot to an interactive one and adjust the hover message regarding the differences between the values
  • we can directly add the information above or below the plot on how to interpret it.
  • we can also have a dropdown menu on the plot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants