Unicode error coming from factors #43

ofajardo · 2021-12-14T14:13:15Z

While trying to read this apparently simple rdata file, the following error arises:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 1: invalid start byte

The problem comes apparently from the fact most information is stored as levels (all variables are factors). IF transforming those factors into characters, then the file is read ok:

i <- sapply(rlvnc2, is.factor)
rlvnc2[i] <- lapply(rlvnc2[i], as.character)

The text was updated successfully, but these errors were encountered:

ofajardo · 2021-12-14T14:23:33Z

apparently the offending bit is in the position 0xb1, while R apparently starts reading from the next position 0xb2 (or at least R shows the information on screen starting from 0xb2)

ofajardo · 2021-12-14T14:45:25Z

The user reports this file is very old. I read the file in R and saved it again with R 4.02 on a mac. The file looks completely different under a hex editor, but the error is the same, in the same position. See attached.
test9.RData.zip

ofajardo mentioned this issue Dec 14, 2021

I try to read a rds file, but get the following error: ofajardo/pyreadr#49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode error coming from factors #43

Unicode error coming from factors #43

ofajardo commented Dec 14, 2021

ofajardo commented Dec 14, 2021

ofajardo commented Dec 14, 2021

Unicode error coming from factors #43

Unicode error coming from factors #43

Comments

ofajardo commented Dec 14, 2021

ofajardo commented Dec 14, 2021

ofajardo commented Dec 14, 2021