-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual variables in ReadStat-created SAV files with long strings are visible un-merged in SPSS #122
Comments
Thanks. I think some of the problems will be fixed here: If problems persist, please open a separate issue for each distinct problem that you are encountering. |
Thank you. The foreign package now reads the files properly (I didn't mean this as a separate problem but rather as a way to test without SPSS). Apparently foreign (unlike ReadStat) cannot automatically turn long strings in virtual variables into one variable (it throws a warning about this). But the import looks the same now, no matter whether the file was generated in SPSS or haven (except that SPSS-generated virtual vars have the name LONG0-9 while ReadStat does V0000001-9; maybe that is it?). I now built the latest ReadStat. I then generated a 2560 char string in SPSS (test_2560_SPSS.sav) in the zip. SPSS does not show the virtual variables. I also attached a 2560char file generated through haven, it looks the same in SPSS. HTH. |
Thanks. This additional debugging information is helpful. I've made some more internal changes, including reporting the SPSS version number as 20 (same as the file you provided): It's possible that one of these changes will trigger the column merge within SPSS. If not, I'll tinker with the virtual variable names. |
That didn't do it unfortunately, the result is unchanged. Yeah, maybe try the variable names. |
Thanks for testing. It would help me if you create a similar file with a variable name that is 8 characters long - I'm curious how SPSS handles the virtual variable numbering. (Internally, SPSS variable names are limited to 8 characters.) |
Here, I made one with 8 and one with 12 char var names, and three with two/three/five vars that have the same first 8 chars. Apparently (checked with foreign) they take the first five chars of the var name, and if those are duplicates, they start with the last variable with digits, then switch to letters once those are exhausted, then letters with digits. Ugh. |
Thanks for the research. I sincerely hope all these letter-digit acrobatics aren't necessary. One last request (I hope): Can you make a file with variables called var0 and var1? That's the current ReadStat naming convention... I'm curious if the virtual variables are VAR00, VAR01, or something else. |
Sure. The names are names(xx)
|
Thanks - for 10+ virtual variables does it wrap around to letters? Tbh I'm okay just supporting 2560-character variables to start, if we can get this working. |
it does (3000chars)
|
Thanks. Try this: If that works for 0-9, I'll look into doing the letter wraparound thing. It shouldn't be too hard, I'm guessing they just use base-36. |
Sorry, it doesn't work, but you're still not using the variable name stem, but the generic VAR00000, right? Or I'm not rebuilding right? |
Hi, I'm now given the variables 5-character names:
Then the virtual variables use this stem:
If this strategy isn't working, then it might be something else that's preventing SPSS from doing the merge. |
Ah, okay. Well SPSS seems to derive the 5-char names from shortening the visible names, which you don't do. Truthfully, I have no idea if this is what prevents the merge. |
Try naming your variables |
Yay! When I name the variable long v0000, it works. At long last ;-) |
Okay, that is great to know! Overall, the SPSS naming algorithm seems pretty complicated, so for now I will provide just enough support that you will be able to work around the limitations of both SPSS and ReadStat. I'd like to support more than the 10,000 columns implied by the The virtual variables will then use the SPSS convention of a base-36 suffix. To start I'll just support a single suffix character (e.g. |
Thanks. Luckily for me, that's above the length of the longest string, because of which I originally raised the issue. |
Try this: |
Okay, I am wondering if SPSS makes an exception for the format Try this: |
Hang on, need to use 1-indexing instead of 0-indexing. |
ok, because this also didn't work. |
Ok, try this: If that doesn't work I'll try to implement the complete SPSS algorithm. |
It doesn't work, sorry. |
Thanks. The full algorithm is complicated so I'm afraid it'll have to wait. I'll leave this issue open though. |
No urgency on my side. Thanks for the hard and free labour.. |
Hi, please try the latest update and let me know if that fixes things for you. I've tried to make SPSS-compatible variable names, though without full name-conflict resolution. |
👍 it looks good! I've only tried with my test examples and with two variable called long and long2, but it works! |
@rubenarslan Great, thanks for letting me know! If you find corner cases etc where the problem persists or the import doesn't work, please file a new issue. For now I will close. One last question: What version of SPSS are you running? I have received scattered reports that SPSS 25 won't import from ReadStat - but haven't been able to confirm. |
Sorry, I just have v.20 |
I am having this issue. That is to say, I am exporting a data frame with two variables that contain character strings longer than 255 characters. But there is something weird. In my case, I am taking two variables that have character strings, many of which are > 255 characters. In an earlier project I exported those to an excel file for spell checking. and now I have to reimport them . I'm really fine to do the re-import and join. But when I then export to an sav file, please note that the original variables (i.e. ending in .x) are exported as single variables. But the new variables are split at 255 characters. I feel like some of the people who have encountered this have come up with some hacks, but they are impenetrable to me.
|
I read more closely in Long string handling #118. This:
worked like a charm. I am sorted. |
Continuing from tidyverse/haven#266:
Although the latest fix works for 256 character variables and reading the file in SPSS, the
foreign
R package cannot read the file. Also, once we go to 512 characters, or 1024 characters, we start seeing the "virtual variables" (I presume) in SPSS rather than one long string.I'll try to build readstat tomorrow and create a reproducible example from CSVs, Friday night over here now.
The text was updated successfully, but these errors were encountered: