Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semtree fails when using tibble instead of data.frame #68

Open
jhorzek opened this issue Jun 26, 2024 · 1 comment
Open

Semtree fails when using tibble instead of data.frame #68

jhorzek opened this issue Jun 26, 2024 · 1 comment

Comments

@jhorzek
Copy link

jhorzek commented Jun 26, 2024

Hi Andreas,

I just tried using semtree with a data set that I had imported into R using haven. The resulting data.frame is a tibble data.frame. When running semtree based on a lavaan model, I was surprised that there were no splits for my data set. Only when casting the tibble to a base R data.frame, the model was partitioned correctly. When using OpenMx models, I found that tibbles result in errors. Here is an example:

library(semtree)
library(tibble)
library(OpenMx)
library(lavaan)

# For lavaan, semtree does not throw any warnings / errors when using tibble,
# but also does not run correctly:

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

HolzingerSwineford1939 <- HolzingerSwineford1939[complete.cases(HolzingerSwineford1939), ]

fit <- cfa(HS.model, data = tibble::as_tibble(HolzingerSwineford1939))
# Using tibble does not work:
tree <- semtree(model = fit, 
                data  = tibble::as_tibble(HolzingerSwineford1939))
print(tree)

# With data.frames, the tree is built correctly
tree <- semtree(model = fit, 
                data  = HolzingerSwineford1939,
                control = semtree.control(method = "score"))
print(tree)

## For OpenMx, I get an error, but one that reports issues with P1 not being
## a factor:

set.seed(23)
N <- 1000
M <- 5
icept <- rnorm(N, 10, sd = 4)
slope <- rnorm(N, 3, sd = 1.2)
p1 <- sample(c(0, 1), size = N, replace = TRUE)
loadings <- 0:4
x <- (slope + p1 * 5) %*% t(loadings) + 
  matrix(rep(icept, each = M), byrow = TRUE, ncol = M) + 
  rnorm(N * M, sd = .08)
colnames(x) <- paste0("X", 1:M)

# Use tibble
growth.data    <- tibble::as_tibble(x)
growth.data$P1 <- as.factor(p1)


manifests <- names(growth.data)[1:5]
growthCurveModel <- mxModel("Linear Growth Curve Model Path Specification",
                            type="RAM",
                            manifestVars=manifests,
                            latentVars=c("intercept","slope"),
                            mxData(growth.data, type="raw"),
                            # residual variances
                            mxPath(
                              from=manifests,
                              arrows=2,
                              free=TRUE,
                              values = c(.1, .1, .1, .1, .1),
                              labels=c("residual","residual","residual","residual","residual")
                            ),
                            # latent variances and covariance
                            mxPath(
                              from=c("intercept","slope"),
                              arrows=2,
                              connect="unique.pairs",
                              free=TRUE,
                              values=c(2, 0, 1),
                              labels=c("vari", "cov", "vars")
                            ),
                            # intercept loadings
                            mxPath(
                              from="intercept",
                              to=manifests,
                              arrows=1,
                              free=FALSE,
                              values=c(1, 1, 1, 1, 1)
                            ),
                            # slope loadings
                            mxPath(
                              from="slope",
                              to=manifests,
                              arrows=1,
                              free=FALSE,
                              values=c(0, 1, 2, 3, 4)
                            ),
                            # manifest means
                            mxPath(
                              from="one",
                              to=manifests,
                              arrows=1,
                              free=FALSE,
                              values=c(0, 0, 0, 0, 0)
                            ),
                            # latent means
                            mxPath(
                              from="one",
                              to=c("intercept", "slope"),
                              arrows=1,
                              free=TRUE,
                              values=c(1, 1),
                              labels=c("meani", "means")
                            )
) # close model

# fit the model to the entire dataset
growthCurveModel <- mxRun(growthCurveModel)


# Using tibble does not work:
tree <- semtree(model = growthCurveModel, 
                data  = growth.data)
# Using data.frame does work:
tree <- semtree(model = growthCurveModel, 
                data  = as.data.frame(growth.data))

Best,
Jannik

@brandmaier
Copy link
Owner

Thanks. For now, I will add a type check whether data is really a data.frame. I am not sure whether supporting tibbles is possible at all but I will keep it on my list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants