Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SL23f crash in StFstRawHitMaker #631

Closed
genevb opened this issue Nov 28, 2023 · 6 comments
Closed

SL23f crash in StFstRawHitMaker #631

genevb opened this issue Nov 28, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@genevb
Copy link
Contributor

genevb commented Nov 28, 2023

This doesn't crash for all DAQ files, but it does for some. Here is a crashing chain to try:

starver SL23f
root4star -b -q -l 'bfc.C(10,"DbV20231127 pp2022a StiCA fst ftt fstRawHit fstMuRawHit BEmcChkStat -hitfilt","/star/data03/daq/2021/352/22352002/st_fwd_22352002_raw_1500011.daq")'

I ran in the debugger and got this....

StFstRawHitMaker:WARN  - StFstRawHitMaker::Make() - No raw ADC dataset found from simu data! 
StFstRawHitMaker:WARN  - StFstRawHitMaker::Make() - No fstCollection found in simu dataset! 
 StFstRawHitMaker:INFO  -  Trying to read ALLdata
*** Error in `/afs/rhic.bnl.gov/star/packages/SL23f/.sl73_gcc485/bin/root4star': malloc(): memory corruption: 0x123aff00 ***

....followed by a long backtrace. I'm not sure why it mentions /tmp/smirnovd in here:

(gdb) where
#0 0xf7fdb425 in __kernel_vsyscall ()
#1 0xf4f1e1f7 in raise () from /lib/libc.so.6
#2 0xf4f1fa33 in abort () from /lib/libc.so.6
#3 0xf4f5d5e5 in __libc_message () from /lib/libc.so.6
#4 0xf4f66a03 in _int_malloc () from /lib/libc.so.6
#5 0xf4f6818a in malloc () from /lib/libc.so.6
#6 0xf5113b27 in operator new(unsigned int) () from /lib/libstdc++.so.6
#7 0xf7dd48dd in TStorage::ObjectAlloc (sz=52)
at /tmp/smirnovd/spack-stage/spack-stage-root-5.34.38-fta7antlmbz65avo4vw6tf7xsbtghfc4/spack-src/core/base/src/TStorage.cxx:325
#8 0x0808df27 in TObject::operator new (sz=52)
at /cvmfs/star.sdcc.bnl.gov/star-spack/spack/opt/spack/linux-rhel7-x86/gcc-4.8.5/root-5.34.38-fta7antlmbz65avo4vw6tf7xsbtghfc4/include/TObject.h:156
#9 0xeaca52af in StFstRawHitCollection::getRawHit (this=0xf050794, elecId=76) at .sl73_gcc485/obj/StRoot/StFstUtil/StFstRawHitCollection.cxx:120
#10 0xe927ce6c in StFstRawHitMaker::FillRawHitCollectionFromAPVData (this=0xf050460, dataFlag=2 '\002', ntimebin=9, counterAdcPerRgroupPerEvent=0xfffca198,
sumAdcPerRgroupPerEvent=0xfffca1f8, apvElecId=0, signalUnCorrected=..., signalCorrected=..., seedFlag=..., idTruth=...)
at .sl73_gcc485/obj/StRoot/StFstRawHitMaker/StFstRawHitMaker.cxx:565

Curiously, gdb can't seem to find the source code unless it is present locally. Doing so, I find...

#9 0xeaca52af in StFstRawHitCollection::getRawHit (this=0xf050534, elecId=75) at .sl73_gcc485/obj/StRoot/StFstUtil/StFstRawHitCollection.cxx:120
120 rawHitPtr = new StFstRawHit();

I'll try running in valgrind, but if someone else knows immediately what's wrong, please chime in.

-Gene

@genevb genevb added the bug Something isn't working label Nov 28, 2023
@genevb
Copy link
Contributor Author

genevb commented Nov 28, 2023

I should also note I'm running in 32-bit mode, and I get the crash in either optimized or not.

I've put a valgrind report here:
~genevb/public/ValgrindReport_StFstRawHitMaker_CrashSL23f.txt

There are some invalid reads just before the crash (search for FATAL in the above report), at StFstRawHitMaker.cxx:551,552,553, and 313.

@genevb
Copy link
Contributor Author

genevb commented Nov 30, 2023

Run numbers marked BAD for crashing and good for not crashing are located here:
~genevb/public/BADgoodRuns_StFstRawHitMaker_CrashSL23f.txt
There is no overlap, and the clear distinction from looking at a bunch of these in the RunLog Browser is that they crash IF AND ONLY IF fst was IN the run.

@jdbrice
Copy link
Contributor

jdbrice commented Nov 30, 2023

Hi gene, thanks for this info. I will work on this and also get the FST grip on it. Btw we are working on QA.

@genevb
Copy link
Contributor Author

genevb commented Dec 5, 2023

I'm not sure this is any additional help, but seeing nothing else reported here, I re-ran valgrind with --leak-check=full to see if that shed any further light. I don't see any other notices in the additional output about StFstRawHitMaker (they're all about FstmGeom and FstmConfig in FSTMGEO). Anyway, here is that output:
~genevb/public/ValgrindReportFull_StFstRawHitMaker_CrashSL23f.txt

@techuan-huang
Copy link
Contributor

Hi Gene and Daniel, thanks for finding this issue and the infos. I have made a pull request to fix this. It is due to the inconsistent number of time bins between data and the codes. Just change the corresponding constant number will fix this issue.

fgeurts pushed a commit that referenced this issue Dec 13, 2023
Issue #631 was due to an inconsistent number of time bins used in the
data and the codes.
In most FST data and in the codes, we used 3 time bins, while some data
used 9 time bins.
Only changing a constant in `StRoot/StEvent/StFstConsts.h` is needed to
fix this issue.

Co-authored-by: Te-Chuan Huang <[email protected]>
@plexoos
Copy link
Member

plexoos commented Dec 13, 2023

resolved by #634

@plexoos plexoos closed this as completed Dec 13, 2023
plexoos pushed a commit that referenced this issue Dec 13, 2023
Issue #631 was due to an inconsistent number of time bins used in the
data and the codes.
In most FST data and in the codes, we used 3 time bins, while some data
used 9 time bins.
Only changing a constant in `StRoot/StEvent/StFstConsts.h` is needed to
fix this issue.

Co-authored-by: Te-Chuan Huang <[email protected]>
dkapukchyan pushed a commit to dkapukchyan/star-sw that referenced this issue Mar 11, 2024
Issue star-bnl#631 was due to an inconsistent number of time bins used in the
data and the codes.
In most FST data and in the codes, we used 3 time bins, while some data
used 9 time bins.
Only changing a constant in `StRoot/StEvent/StFstConsts.h` is needed to
fix this issue.

Co-authored-by: Te-Chuan Huang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

5 participants