-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken streaming of vector of enum with underlying type other than int #16312
Comments
Can you give us a bit more information? What would be useful, if possible:
Is it confirmed that the same data serialized on ARM does not cause a crash? |
For the file: https://cernbox.cern.ch/s/MXkLwJLm61rckhj I cannot confirm if the same data serialised on ARM does not cause a crash. |
is one of the stacktraces. It actually dies in different ways, most likely there is some memory corruption going on... |
For the ALICE environment, the easiest is probably sitting together. It's on a custom machine in my private area. |
Thanks. I'm not at CERN today but getting started with the information. |
(Side note: |
Another stacktrace which seems to be related to this is:
|
Interestingly enough, the actual array returned by backtrace can be decoded by GDB to:
|
Some more points gathered during a debug session:
does not seem to indicate a problem because the same list of streamer elements also contains the expected
|
Further debugging revealed a deeper issue that seem to only by chance surface on ARM/Linux: Writing or reading a vector of enums goes through the collection proxy. The collection proxy will use |
I think the cause is https://github.com/root-project/root/blob/master/io/io/src/TGenCollectionProxy.cxx#L404 (and similar lines further down), that hard-code the enum underlying type to int. When fixing, I think we need to take care of what happens to files already written out with the wrong enum width. |
Do I understand correctly this affects only scoped enums within a vector? Can I simply fix it on my side by moving to |
Although: I'm not exactly sure if already existing files that were serialized with a shorter enum correctly read back. I think yes, but that needs to be tested. |
This I can try on my side. |
I'm attaching a minimal reproducer. minimalTestVectorOfEnums.tar.gz This test returns (wrongly)
With a patch to
I think the next steps should be discussed with @pcanal. In particular:
|
AFAICT, neither TTree nor RNTuple I/O are affected by this issue. |
ROOT has an issue when serializing std::vector<T> where T is a scoped enum backed by something which has size different from int one. This is true for any architecture, it just happens to be more lethal on ARM. For more details root-project/root#16312. Add explicitly types to the linkdef while at it.
ROOT has an issue when serializing std::vector<T> where T is a scoped enum backed by something which has size different from int one. This is true for any architecture, it just happens to be more lethal on ARM. For more details root-project/root#16312. Add explicitly types to the linkdef while at it.
If the next data member (which should not be listed right after it) is of the same type, |
We shall be able to fix the usage in regular I/O and TTree (which is also broken) when using dictionary. The proper support in bare ROOT might be harder (the underlying size information is a bit harder to find and in some case might not be (yet?) available (top level vector of enums)). |
With dictionaries, it seems to work fine (for embedded vectors probably not for standalone vector) because the TStreamerInfo of the containing class records the underlying type and thus know when a conversion is needed (The corollary is that a class version number must be updated (to allow schema evolution) if one of the enums type it uses changes its underlying type). |
For the record, as you might have seen in AliceO2Group/AliceO2#13464, simply changing the types breaks reading back old files (i.e. two shorts are read in an int). Could you comment when do you expect to have a fix for this on your side which applies to 6.32.2 and if it will allow old code to still read new data (and viceversa new code / old data)? |
Side note for the record, the original valgrind report and crash happens in the case where the I have a workaround that solves the problem for the case in the minimal reproducer which resolves around setting a read rule for the vector of enums:
However it does not work yet for the actual/original problem :(. (In the minimal reproducer the size of the container is double what it should be has no over-write/crash, while in the original the container ends up with the right size but with an over-write and thus crash). |
The following custom Streamer works around the issue:
[Call to |
Any followup to the bug itself? Will we have a fix in ROOT which avoids a custom streamer? |
Check duplicate issues.
Description
I need help to understand an issue which we have when running on Linux on ARM when reading a file which was serialised on x86. Notice that this platform is peculiar, because
char
(without specifier) is unsigned, and not signed (char sign-ess is implementation detail in the standard).This is important because
mPadSubset
that you will see below is anenum PadSubset : char
. Running in valgrind, the issue appears as dumped below.What puzzles me and what I think is the culprit of the segmentation fault is the line:
as I would have expected it to be len=1. Can you explain me what is going on?
Reproducer
I do not have one which does not involve running ALICE reconstruction on ARM.
ROOT version
6.32.02.
Installation method
aliBuild
Operating system
ALMA Linux 9 on ARM64 (Ampere Altra)
Additional context
No response
The text was updated successfully, but these errors were encountered: