Sav files created by IBM Proprietary SPSS Modeler Software on IBM Cloud are not properly readable by Pyreadstat(python)/Haven(R) #319

ananjay-gurjar-ibm · 2024-10-21T10:13:47Z

Consider the given sav files sav-read-issue.zip created by IBM Proprietery SPSS Modeler on IBM Cloud. The original data and corresponding metadata written in the file as read by SPSS Statistics

The same file when read from Python's Pyreadstat and R's Haven library shows up as below:

Clearly from the metadata in the above screenshot python was able to figure out that string column is A16 (i.e. alphanumeric string of 16 bytes) but it ended up reading only 8 bytes of data.
The metadata Internal Type code which is used to specify length of a string column is correctly set in the given file (ref screenshot)

Now since sav file is continuous bytes of data it messes up the whole structure which explains the garbage value in double column(i.e. num2).

This problem also leads to python and R giving out Unable to convert string to the requested encoding (invalid byte sequence) incase of file containing multiple lines which I suspect is coming from library(ReadStat) trying to decode bytes written for double data to string (as string is utf-8 encoded) from the second line.

cc: @sainathmekala22

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sav files created by IBM Proprietary SPSS Modeler Software on IBM Cloud are not properly readable by Pyreadstat(python)/Haven(R) #319

Sav files created by IBM Proprietary SPSS Modeler Software on IBM Cloud are not properly readable by Pyreadstat(python)/Haven(R) #319

ananjay-gurjar-ibm commented Oct 21, 2024

Sav files created by IBM Proprietary SPSS Modeler Software on IBM Cloud are not properly readable by Pyreadstat(python)/Haven(R) #319

Sav files created by IBM Proprietary SPSS Modeler Software on IBM Cloud are not properly readable by Pyreadstat(python)/Haven(R) #319

Comments

ananjay-gurjar-ibm commented Oct 21, 2024