Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode error coming from factors #43

Open
ofajardo opened this issue Dec 14, 2021 · 2 comments
Open

Unicode error coming from factors #43

ofajardo opened this issue Dec 14, 2021 · 2 comments

Comments

@ofajardo
Copy link

While trying to read this apparently simple rdata file, the following error arises:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 1: invalid start byte

The problem comes apparently from the fact most information is stored as levels (all variables are factors). IF transforming those factors into characters, then the file is read ok:

i <- sapply(rlvnc2, is.factor)
rlvnc2[i] <- lapply(rlvnc2[i], as.character)
@ofajardo
Copy link
Author

apparently the offending bit is in the position 0xb1, while R apparently starts reading from the next position 0xb2 (or at least R shows the information on screen starting from 0xb2)

@ofajardo
Copy link
Author

The user reports this file is very old. I read the file in R and saved it again with R 4.02 on a mac. The file looks completely different under a hex editor, but the error is the same, in the same position. See attached.
test9.RData.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant