Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return original subjects IDs in the imputed datasets #382

Open
nociale opened this issue Nov 14, 2022 · 5 comments
Open

Return original subjects IDs in the imputed datasets #382

nociale opened this issue Nov 14, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@nociale
Copy link
Collaborator

nociale commented Nov 14, 2022

Would be better if the subjid variable had the same subjects IDs as in the input data

data("antidepressant_data")
dat <- antidepressant_data

dat <- expand_locf(
    dat,
    PATIENT = levels(dat$PATIENT), # expand by PATIENT and VISIT 
    VISIT = levels(dat$VISIT),
    vars = c("BASVAL", "THERAPY"), # fill with LOCF BASVAL and THERAPY
    group = c("PATIENT"),
    order = c("PATIENT", "VISIT")
)
vars <- set_vars(
    outcome = "CHANGE",
    visit = "VISIT",
    subjid = "PATIENT",
    group = "THERAPY",
    covariates = c("BASVAL*VISIT", "THERAPY*VISIT")
)
method <- method_condmean(type = "bootstrap", n_samples = 0)
drawObj <- draws(
    data = dat,
    data_ice = NULL,
    vars = vars,
    method = method,
    quiet = TRUE
)
imputeObj <- impute(drawObj)
d <- extract_imputed_dfs(imputeObj)[[1]]
head(d$PATIENT) # Original IDs
head(dat$PATIENT) # New IDs

Original IDs:
image

New IDs:
image

This would be useful if then one wants to do other analyses and needs the original IDs (e.g. to join two datasets based on the IDs)..

Was it necessary to change the IDs?

@nociale nociale added the bug Something isn't working label Nov 14, 2022
@gowerc
Copy link
Collaborator

gowerc commented Nov 17, 2022

It was essential to change the patient IDs to ensure they were unique when you specify the unstructured covariance matrix as otherwise you would be grouping observations across multiple patients who were sampled from the same original patient. If memory serves me right there is an argument to extract_imputed_dfs() that returns an attribute on the dataframe which can be used to map the new names -> old names.

@nociale
Copy link
Collaborator Author

nociale commented Nov 17, 2022

Indeed, setting the argument idmap = TRUE will return an attribute on the dataframe. This attribute is a named vector that has values equal to the original IDs and names equal to the new IDs.

Easy way to join the original IDs in an imputed dataset:

d <- extract_imputed_dfs(imputeObj, idmap = TRUE)[[1]]
idmap <- attributes(d)$idmap
d$original_id <- idmap[match(d[[vars$subjid]], names(idmap))]

Thanks a lot.
I will close this issue.

@nociale nociale closed this as completed Nov 17, 2022
@gowerc gowerc reopened this Nov 17, 2022
@gowerc
Copy link
Collaborator

gowerc commented Nov 17, 2022

@nociale , Have re-opened the issue as I think it might be worth us adding something more explicit about this in one of the vignettes.

@gowerc
Copy link
Collaborator

gowerc commented Nov 17, 2022

Alternatively, maybe its worth updating the function to add on the "original_id" column instead of just returning the attribute ?

@nociale
Copy link
Collaborator Author

nociale commented Nov 18, 2022

Yes, good idea! We could either (1) set by default idmap = TRUE, or (2) return the "original_id" instead of the modified IDs. If we go with the latter, we could remove the argument idmap if it is not needed anymore. My preference is for (2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants