-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem when importing factor variables with SOME missing labels #79
Comments
Hi @raspatan , does the If I understand correctly, there is a tidyverse package that aims at labelled vectors and @sjewo was looking into it in #73 . The problem with this is, I don't want any tidyverse dependency. |
Hi @JanMarvin No, I guess what I'm after is a third way. With For an example, run this in Stata:
Now import the above in R with the different options. Either you get numeric variable with 0 and 1, or you get factor variable with 1 and 2. The third way would be to allow for factor but using values 0 and 1, as in Stata. Personally, I don't see why a goal of the package should not be to reproduce exactly the characteristics of variables in Stata. Currently, it forces us to chose between factors or numerical. Without a warning that factors are recreated by R, I see this as very problematic. Fortunately I became aware of the problem. Otherwise my analysis would have led to wrong results. |
Hi @raspatan , the issue is that you cannot assume that when you import something from one statistical pacakge to another, that it provides all the same functionality. For this package it is similar with Statas support for variable labels or dataset labels. But it is true for every other conversion from one package to another, with SAS, SPSS or even something like Excel. There are always compromises one has to make and assuming that something works in one software and will work in another software just identical might be misleading. Regarding the factors, the numerical value of a factor in R is an index beginning at 1. Factors are not just labeled numerics. Therefore what you suggested above is simply not possible in R and these R internals haven't changed for a long long time. I do not say that they are the best, but they have been in place since I assume the development of S. Of course the world has changed a lot and there are valid reasons why people nowadays like to use packages such as the If you do not want to have value labels, we provide everything you need (I'm not a fan of factors myself, they are mostly a nuisance to work with and I prefer plain old numerics and characters). I have used the plain object, there might be helper functions we provide: > auto <- readstata13::read.dta13("http://www.stata-press.com/data/r16/auto.dta", convert.factors = FALSE)
>
> table(auto$foreign)
0 1
52 22
>
> lab_name <- which(names(auto) == "foreign")
> val_label <- attr(auto, "val.labels")[lab_name]
> lab_table <- attr(auto, "label.table")[[val_label]]
> lab_table
Domestic Foreign
0 1 PS: When I checked the issue yesterday, I remembered that the link was pointing at a different SO post. |
Yes, this is true. I think I got used to things working fine in the past. I just started to use factors in R. I'm not expert in R or related languages so cannot really comment on the complexity of the issue. But I take your word for it. And sorry for sounding aggressive. It was not my intention. It is of course on the part of the user to check things work but I still suggest you add a warning or message somewhere (perhaps the HELP file) to make sure people is aware of differences between Stata and R factors. Just my opinion. |
Well, don't worry about it, I guess we can always improve the documentation. But writing documentation is not the fun part of development 😄 |
I have a problem when importing labels from Stata to R. Problem seems to occur because the variable has SOME missing labels.
read.dta13
seems to assign values without labels first and put those with labels at the end. This seriously affects the consistency of the data (var = 3 in Stata is not var = 3 in R!). All the details, with images and data sample are here."Solution" is to not import labels, using
convert.factors = FALSE
option. But this is not a real solution to the problem. One would like to keep available labels.Seems to be a serious problem. I wonder whether the problem is in the package itself or somewhere else.
The text was updated successfully, but these errors were encountered: