You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider the given sav files sav-read-issue.zip created by IBM Proprietery SPSS Modeler on IBM Cloud. The original data and corresponding metadata written in the file as read by SPSS Statistics
Clearly from the metadata in the above screenshot python was able to figure out that string column is A16 (i.e. alphanumeric string of 16 bytes) but it ended up reading only 8 bytes of data.
The metadata Internal Type code which is used to specify length of a string column is correctly set in the given file (ref screenshot)
Now since sav file is continuous bytes of data it messes up the whole structure which explains the garbage value in double column(i.e. num2).
This problem also leads to python and R giving out Unable to convert string to the requested encoding (invalid byte sequence) incase of file containing multiple lines which I suspect is coming from library(ReadStat) trying to decode bytes written for double data to string (as string is utf-8 encoded) from the second line.
Consider the given sav files sav-read-issue.zip created by IBM Proprietery SPSS Modeler on IBM Cloud. The original data and corresponding metadata written in the file as read by SPSS Statistics
The same file when read from Python's Pyreadstat and R's Haven library shows up as below:
Clearly from the metadata in the above screenshot python was able to figure out that string column is A16 (i.e. alphanumeric string of 16 bytes) but it ended up reading only 8 bytes of data.
The metadata Internal Type code which is used to specify length of a string column is correctly set in the given file (ref screenshot)
Now since sav file is continuous bytes of data it messes up the whole structure which explains the garbage value in double column(i.e. num2).
This problem also leads to python and R giving out
Unable to convert string to the requested encoding (invalid byte sequence)
incase of file containing multiple lines which I suspect is coming from library(ReadStat) trying to decode bytes written for double data to string (as string is utf-8 encoded) from the second line.cc: @sainathmekala22
The text was updated successfully, but these errors were encountered: