Negative densities / other crashes with latest master #53

JBorrow · 2020-12-04T11:35:11Z

Describe the bug

I've been trying to run the latest(ish) master of VR on some SWIFT outputs (on COSMA7), and I've been getting a couple of odd crashes.

To Reproduce

Version cb4336d.

Ran on snapshots under /snap7/scratch/dp004/dc-borr1/new_randomness_runs/runs/Run_*

Log files

STDOUT:

...
[ 528.257] [debug] search.cxx:2716 Substructure at sublevel 1 with 955 particles
[ 528.257] [debug] unbind.cxx:284 Unbinding 1 groups ...
[ 528.257] [debug] unbind.cxx:379 Finished unbinding in 1 [ms]. Number of groups remaining: 2

STDERR:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Particle density not positive, cannot continue

Environment (please complete the following information):

cmake .. -DCMAKE_CXX_FLAGS="-O3 -march=native" -DVR_MPI=OFF -DVR_HDF5=ON -DVR_ALLOWPARALLELHDF5=ON -DVR_USE_HYDRO=ON

Currently Loaded Modulefiles:
 1) python/3.6.5               5) parallel_hdf5/1.8.20     
 2) ffmpeg/4.0.2               6) gsl/2.4(default)         
 3) intel_comp/2018(default)   7) fftw/3.3.7(default)      
 4) intel_mpi/2018             8) parmetis/4.0.3(default)

The text was updated successfully, but these errors were encountered:

MatthieuSchaller · 2020-12-04T11:49:51Z

@JBorrow do you have the git SHA of an earlier revision where this worked?

JBorrow · 2020-12-04T12:01:11Z

Nothing helpful, the version I usually use is quite old.

rtobar · 2020-12-04T15:52:07Z

This is most probably the same issue underlying #37. It should in principle be the same problem, except now it's caught earlier and with a different error message (that has been added since #37 was reported to help picking up this issue earlier at runtime, and with a more meaningful explanation). Please check if the workaround described in the last messages in that ticket (disabling OpenMP-based FOF) removes the problem. Note also that this seems to be a problem only when compiling without MPI support (-DVR_MPI=OFF).

@JBorrow when you say the version you usually use is quite old, and that in itself could be helpful. If you could report here what exact version was that it would help finding out the real underlying issue.

MatthieuSchaller · 2020-12-06T10:26:51Z

I am not entirely sure it is a duplicate of #37 but it might. The reason I say this is that I get the code to run with fee3a9f but break on cb4336d. And both these are fairly recent versions.

The error message is

terminate called after throwing an instance of 'std::runtime_error'
  what():  Particle density not positive, cannot continue

If you need full info, the first one was compiled here:

works: /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build
breaks: /snap7/scratch/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build

Both cases were compiled with Intel 2020. Both have VR_OPENMP switched ON and VR_MPI switched OFF. Both are run using 28 OMP threads on a single node.

Input is

snapshot: /snap7/scratch/dp004/jlvc76/SWIFT/EoS_tests/swiftsim/examples/EAGLE_ICs/EAGLE_25/eagle_0036.hdf5
param file: /snap7/scratch/dp004/jlvc76/SWIFT/EoS_tests/swiftsim/examples/EAGLE_ICs/EAGLE_25/vrconfig_3dfof_subhalos_SO_hydro.cfg

Happy to copy anything over to other places or give you more info is needed.

rtobar · 2020-12-07T08:30:25Z

@MatthieuSchaller interesting... if you say that a previous build worked and now it didn't then I introduced a regression. My guess is that the diagnosis is correct; i.e., there are particles with non-negative densities, leading to -inf potentials, but that in your case these do not make it to DetermineDenVRatioDistribution where the crash reported in #37 happened. I still need to corroborate this, so for the time being it's just a guess.

I'm happy to revert the change that adds the check for non-positive densities. That would remove this regression, but OTOH it could mean we are doing wrong calculations silently.

MatthieuSchaller · 2020-12-07T11:00:03Z

Yes, agreed, it's not ideal. Just thought this might help track things down somehow.

MatthieuSchaller · 2020-12-16T09:08:38Z

@JBorrow does the latest master without OMP but with MPI give you an alternative that is fast enough?

rtobar · 2020-12-22T08:39:01Z

@JBorrow @MatthieuSchaller please see #37 (comment) for a very likely potential fix for this problem. If that fix makes this issue disappear then we can close this ticket.

rtobar · 2021-01-04T09:43:59Z

I'm trying this out in cosma6 at the moment. Sadly the fix mentioned in #37 (comment) does not seem to address the problem reported in this issue, although it's probably very similar in nature. At the moment I'm trying to gather more information and seeing if there's anything obvious around 8b0cf42, 9a21987 and 08595f8 that could help here.

rtobar · 2021-02-15T07:43:20Z

I'm commenting here to to leave a trail: in #60 yet another instance of this problem has been reported. Details can be found throughout the comments, including build configuration details.

rtobar · 2021-02-19T02:35:15Z

Another instance of this issue with hydro runs was reported in #37 (comment) for a zoom, non-MPI execution.

rtobar · 2021-02-19T02:39:03Z

@MatthieuSchaller with the recent surge of more cases of errors due to negative potentials (a check I added as part of #37), I'm starting to think maybe I'll turn the check off, or at least turn it into a warning instead of an unrecoverable error, or even make this behavior configurable. In #53 (comment) I argued that invalid results might be end up being generated due to this, but otherwise this problem seems to be affecting it too many areas. Thoughts?

Edit: I just double-checked with Pascal, and he confirmed this check is correct; i.e., all particles at this point of the code should have properly-defined densities.

MatthieuSchaller · 2021-02-21T12:44:46Z

That would be a worry then. If we can get to that point and find particles that have negative densities it means something went wrong.

I suppose if we now it only happens when OpenMP is on then we can have that issue here closed and taken over by the wider OMP issue.

rtobar · 2021-02-26T05:04:58Z

@MatthieuSchaller do you know of any (hopefully small) dataset + configuration file that can be used to consistently reproduce this error? I went through all the reported occurrences and either the original input and configuration files are gone, or the error occurred as part of a SWIFT + VR run, so I can't find an easy way to actually trigger it.

I can try to reason a bit more about the code and try to advance like that, but having a small, reproducible example would be great.

MatthieuSchaller · 2021-02-26T09:02:09Z

That's unfortunate. Let me dig out some older test case that was backed up.

JBorrow · 2021-02-26T12:20:05Z

Sorry @rtobar; my config at /cosma/home/dc-borr1/c7dataspace/XL_wave_1/runs/Run_0 crashes. This just happened now, using the latest maser.

MatthieuSchaller · 2021-02-26T12:52:35Z

Could you expand on the modules used and cmake flags used?

JBorrow · 2021-02-26T13:09:03Z

Currently Loaded Modulefiles:
 1) python/3.6.5               5) parallel_hdf5/1.8.20      9) cmake/3.18.1  
 2) ffmpeg/4.0.2               6) gsl/2.4(default)         
 3) intel_comp/2018(default)   7) fftw/3.3.7(default)      
 4) intel_mpi/2018             8) parmetis/4.0.3(default)

cmake -DVR_USE_HYDRO=ON -DCMAKE_CXX_FLAGS="-fPIC" -DCMAKE_BUILD_TYPE=Release ..; make -j

MatthieuSchaller · 2021-02-26T13:23:53Z

Can you try disabling OpenMP?

JBorrow · 2021-02-26T13:28:17Z

Yeah, it works fine if you run with MPI only. That's an okay fix for now - it's actually faster for my 25s - but the behaviour should not be different between this and OMP. On 26 Feb 2021, at 13:24, Matthieu Schaller <[email protected]<mailto:[email protected]>> wrote: [EXTERNAL EMAIL] Can you try disabling OpenMP? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB3Z6G6MUP4SENHG5W5KCOLTA6OHPANCNFSM4UNKJL6Q>.

MatthieuSchaller · 2021-02-26T13:48:17Z

Ok, that's interesting. Some of the cases above were breaking both with and without OMP.

JBorrow · 2021-02-27T08:01:30Z

Okay - I was wrong - I get the negative densities in pure MPI only mode (/cosma/home/dc-borr1/c7dataspace/XL_wave_1/runs_correct_dt/Run_0, snapshot 0000).

MatthieuSchaller · 2021-03-03T13:18:44Z

Do we have a way of knowing that the particle is or what object it belongs to to try to understand what the offending setup is?

rtobar · 2021-03-03T13:25:41Z

Yes, in hindsight I should have added more information to the error message. I'll do that and push a commit straight to the master branch.

BTW, I tried replicating the issue with the data, settings and compilation flags used by @JBorrow, but without success so far. I tried in a local system with 20 cores, so tries a few combinations of OpenMP threads and MPI ranks (1x20, 3x6, 20x1) and none of them produced the error. Maybe I miseed something obvious so I'll re-try tomorrow, but otherwise this will be a bit harder to track down than anticipated. I also tried briefly to reproduce in cosma, but the queues were full and I was trying to go for an interactive job -- next time I'll just submit a job instead.

MatthieuSchaller · 2021-03-03T13:31:21Z

Thanks. This is small enough to run on the login nodes if that helps (though I will deny having ever said that when questioned by the admin police...)

JBorrow · 2021-03-03T14:36:01Z

Could it be the fact I'm using Intel 2018?

MatthieuSchaller · 2021-03-03T16:00:51Z

Worth checking. I'd think the MPI implementation should make no difference but the compiler may come packed with a different version of OMP.

rtobar · 2021-03-16T07:36:58Z

A small update: with the smaller test case mentioned in #53 (comment), I found that the particle in question has density=0 because this particle didn't have its density calculated, not because the result of the calculation was 0. This is a small, but important piece of information, and it should help finding out what's wrong.

I was also able to reproduce the error earlier in the code, so instead of it failing in GetDenVRatio I can see the same error for the same particle (same pid and type, but different id/pos/vel values because of the dynamic nature of these fields) in the same rank at the end of GetVelocityDensityApproximative, which is where densities are calculated. This brings the cause and effect much closer in time, hopefully making it easier to analyse the problem.

MatthieuSchaller · 2021-03-16T18:43:04Z

Interesting. Could this particle be far from the centre of its object and hence be part of the spherical over-density but not of any actual sub-structure?

JBorrow · 2021-04-25T16:44:23Z

The case /cosma/home/dc-borr1/c6dataspace/XL_wave_1/runs/Run_8 for snapshot 6 (z=0) is a new crashing case. This breaks for both MPI and OpenMP only runs. Vr version, config log, etc, all linked in the directory under the velociraptor_submit_0006.slurm file.

rtobar · 2021-04-30T07:45:14Z

I could finally dedicate some more time today to this problem.

As mentioned previously in #53 (comment) some particles end up with density=0 because their density is not computed, not because the result of the computation is 0. Particles are not iterated directly, but instead they are first grouped into leafnode structures (I'm assuming these are the leaf nodes of the KDTree containing the particles). The code then iterates over the leafnode structures, and for each leafnode the contained particles are iterated over.

Using the smaller reproducible example mentioned in #53 (comment) I've now observed how this plays out. When MPI is enabled density calculation is a two-step process:

First each rank loops through its leafnode structures to iterate over its particles. Before calculating densities it first checks if there is some overlap with particles from a different rank, and if so the leafnode is skipped. For those that are not skipped, the contained particles have their densities calculated.
During the second pass the overlapping is taken care of: particle information is sent across ranks so they can now compute densities for the particles in leafnodes that were skipped in the first step.

Note that the extra overlapping checks and particle communication happens only if the Local_velocity_density_approximate_calculation is ==1. but if a greater value is given (e.g., 2) then only the first step is taken and no local leafnode structures are skipped. So a workaround could be to set this value to 2, although presumably some results will change.

The problem I've reproduced happens when during the second step particle information is exchanged, but a rank doesn't receive any particle information. In such cases the code explicitly skips any further calculations, and any leafnode that were skipped in the first step are not processed, leading to particles whose densities are not calculated. This sounds like a bug either in the calculation of the overlaps (there are two flavours depending on whether MPI mesh decomposition is used, I'm using it), or in the code exchanging particle information. In other words, if overlapping is found then particle information is expected to arrive. Alternatively it might be correct to not expect particle information to arrive (only sent) in certain scenarios, in which case the code should treat such leafnode structures again as purely local and process them.

I'm not really sure what's the best to do here. @pelahi, given the situation described above would you be able to point us in the right direction?

This analysis is consistent with at least the MPI-enabled crash @JBorrow reported in the comment above. In there one can see:

[0026] [ 622.342] [debug] localfield.cxx:942 Searching particles in other domains 0

The giveaway here is the 0 (as in "no overlapping particles were imported into my rank"). This is what leads then to the fatal crash in rank 26:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Particle density not positive, cannot continue.

 Information for particle 1377/4476: id=1377, pid=9634564, type=1, pos=(0.0820553, -0.0173851, -0.0163657), vel=(13.4933, -32.6683, -5.64264), mass=0.000969505, density=0
Information for execution context: MPI enabled=yes, rank=26/28, OpenMP enabled=yes, thread=0/1

The crash in the non-MPI, OpenMP case might be yet a different problem.

pelahi · 2021-05-03T01:10:55Z

Hi @rtobar , I'll look into this later today. I think it is likely a simple logic fix.

rtobar · 2021-05-04T09:08:19Z

Thanks to @pelahi there is now a new commit on the issue-53 branch (371b685) that addresses the problem highlighted on my last comment. The underlying problem was that the code was incorrectly assuming that if during the first processing step one the leafnode structures was skipped then at least some particle data had to be received from the other ranks. This assumption was incorrect: when no data was received but there were leafnode structures that were skipped during the first step, then the skipped structures still had to be processed locally during the second step. This fix applies

With this patch I can now run the small defective test case until the end, meaning that all particles have their densities calculated.

I tried also reproducing the last crash (MPI case) reported by @JBorrow and with the patch I can get past the original problem. Later on during the execution the code crashes again though, but seems like an unrelated issue (more like #71 or #73, but haven't dug on it).

With this latest fix I feel fairly confident the underlying problem is finally gone for the MPI-enabled cases, but there still remains the MPI-disabled problem @JBorrow is still having, and that I think has happened in a few other places. So while I'll put the latest fix on the master branch, I think we can't still call it a day here.

MatthieuSchaller · 2021-05-04T09:11:06Z

I like the sound of this!

JBorrow · 2021-05-04T09:40:12Z

Thanks for your hard work!

rtobar · 2021-05-05T16:36:55Z

Just a quick update: I reproduced the MPI-disabled crash that @JBorrow experimented in cosma6 with the same dataset. Here's a small backtrace:

(gdb) bt
#0  0x00007f9f04af2387 in raise () from /lib64/libc.so.6
#1  0x00007f9f04af3a78 in abort () from /lib64/libc.so.6
#2  0x00007f9f057261e5 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007f9f05723fd6 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007f9f05722f99 in __cxa_call_terminate (ue_header=ue_header@entry=0x7f99d4de90a0) at ../../../../libstdc++-v3/libsupc++/eh_call.cc:54
#5  0x00007f9f05723908 in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=2, exception_class=5138137972254386944, ue_header=0x7f99d4de90a0, context=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:676
#6  0x00007f9f050b5eb3 in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7f99d4de90a0, context=context@entry=0x7f9a9cbe09a0) at ../../../libgcc/unwind.inc:62
#7  0x00007f9f050b66de in _Unwind_Resume (exc=0x7f99d4de90a0) at ../../../libgcc/unwind.inc:230
#8  0x0000000000572846 in GetDenVRatio(Options&, long long, NBody::Particle*, long long, GridCell*, Math::Coordinate*, Math::Matrix*) ()
#9  0x0000000000571e4c in GetDenVRatio(Options&, long long, NBody::Particle*, long long, GridCell*, Math::Coordinate*, Math::Matrix*) ()
#10 0x00000000005b0390 in PreCalcSearchSubSet(Options&, long long, NBody::Particle*&, long long) ()
#11 0x000000000058eb50 in SearchSubSub(Options&, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long*&, long long&, long long&, PropData*) ()

No surprises here, this is where the code has always been crashing. However with the latest changes that have been integrated into the master branch there is a double-check for density != 0 in all particles at the end of GetVelocityDensityApproximative (the function where densities are calculated) to crash earlier rather than later. This means that the problem we are facing here is different: the particles that have their densities = 0 have them as such not because they were skipped by GetVelocityDensityApproximative, but because they never made it into the call. With more time I'll try to continue digging, and hopefully I'll also find a smaller, reproducible example that I can use locally to better pin down what's going on.

JBorrow · 2021-05-31T19:44:04Z

Another clue here, from running on COSMA-8 (this has 128 cores/node). The code was far more likely to crash when running on all 128 (rather than the 32 that I resubmitted with) cores, in the MPI-only configuration, with this problem.

james-trayford · 2021-06-21T10:02:04Z

Hi, new to the party, but I'm finding this behaviour for commit 8f380fc. A difference here is this is a zoom run, with VR configured with VR_ZOOM_SIM=ON (mpi off, omp on) on cosma7. Just found this issue, and still need to go through the whole thread to see if any of the suggestions fix things, but thought worth reporting here for now

terminate called after throwing an instance of 'vr::non_positive_density'
what(): Particle density not positive, cannot continue.

Particle information: id=0, pid=2604476, type=1, pos=(-0.00540459, 0.00496019, -0.00370049), vel=(-4.63392, 13.0163, -20.7466), mass=0.000121167, density=0
Execution context: MPI enabled=no, OpenMP enabled=yes, thread=0/1[ 145.889] [debug] search.cxx:2719 Substructure at sublevel 1 with 634 particles

MatthieuSchaller · 2021-06-22T08:05:28Z

Welcome to the party...

The current "workaround" to just get somewhere with production runs is to toggle MPI and OMP on/off. One combination might be lucky...

Apart from that, your simulation is likely quite small so could you give the directory on cosma, as well as config file?
Might be a useful small test to see what may be going on. The more problematic examples we have the more likely we are to identify the issue.

james-trayford · 2021-06-22T13:56:28Z

Welcome to the party...

The current "workaround" to just get somewhere with production runs is to toggle MPI and OMP on/off. One combination might be lucky...

Thanks, configuring with MPI on seems to be working fine for me now.

Apart from that, your simulation is likely quite small so could you give the directory on cosma, as well as config file?
Might be a useful small test to see what may be going on. The more problematic examples we have the more likely we are to identify the issue.

yes the zooms seem to have a high hit-rate with this bug (3/3 of the last I've tried), the last one I was using is here for now:
/cosma7/data/dp004/wmfw23/colibre_dust/runs/zooms/Auriga/h1/data/snapshot_0013.hdf5

rtobar · 2021-06-23T04:11:21Z

@james-trayford since the introduction of the "no positive density" check we have found at least three different points in the code that were causing this issue in different ways:

DMO breaks on substructure search #37 was a zoom-specific problem that has long been fixed.
In this issue we already fixed an MPI-related problem.
There still remains at least one unsolved problem, likely triggered by OpenMP-enabled builds (regardless of MPI being enabled or disabled) and somehow related to domain decomposition: it usually shows with high CPU counts and/or big domains, hence it's been difficult to track down.

I'd assume your issue is the third, and that by using MPI you reduced the number of OpenMP threads on each rank, thus avoiding the issue. An alternative workaround (I think, not fully certain) would be to set OMP_run_fof to a non-zero value in the configuration file and try again. This disables OpenMP during FOF searching, so things would run fairly slower, but execution would finish.

pelahi · 2021-06-23T05:25:19Z

Hi @james-trayford, I will have a look using your snapshots. Any chance you could put them somewhere I could access them? @MatthieuSchaller should I email Adrian to see if I can access cosma now that I have changed institutions?

MatthieuSchaller · 2021-06-23T06:40:28Z

Do you still have access to gadi? I can copy things there, the setup is fairly small.

pelahi · 2021-06-23T08:06:57Z

Sadly no, but I can request access again.

…

On Wed, 23 Jun 2021 at 14:40, Matthieu Schaller ***@***.***> wrote: Do you still have access to gadi? I can copy things there, the setup is fairly small. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC3ZASYXDYP6FHLAYH7SRFLTUF6WNANCNFSM4UNKJL6Q> .

MatthieuSchaller · 2021-06-23T08:08:23Z

Ok. Might be easier to revive your cosma account then. Feel free to email Adrian and cc me in.

MatthieuSchaller · 2021-06-29T12:13:36Z

Is there anything I can do here to help with this issue? Tests? narrowing down of a use case? etc.

pelahi · 2021-06-30T14:56:29Z

Hi @james-trayford, could you provide the config options you ran with? I am not encountering the error but it could be something specific. How many MPI ranks and OMP threads did you run with?

stuartmcalpine · 2021-11-03T11:10:24Z

I have also recently just encountered this.

DMO zoom simulation, no MPI.

Has anyone made any progress with this? Can I help?

rtobar · 2021-11-03T13:19:25Z

@stuartmcalpine unfortunately no new progress has been made here. I think the best summary of the situation is #53 (comment), where a workaround (not a great one, but should work) is suggested. Older comments go into all the details on how this story has unfolded...

In order to fix this problem the best would be to find a small, quickly-reproducible example. As mentioned in the comment, this seems to be a problem with domain decomposition on high CPU counts, but that's a guess. In any case having the full details of your failure would definitely be a gain.

stuartmcalpine · 2021-11-08T12:04:22Z

I am doing some zoom runs, no mpi, on-the-fly with swift. For these tests, cosma 8 and 128 threads. The DMO version segfaults on the 4th invocation, and the hydro version later, like the 10th. But the DMO is consistently failing at the same place.

module load intel_comp/2018 intel_mpi/2018 fftw/3.3.7
module load gsl/2.5 hdf5/1.10.3

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-fPIC" -DVR_ZOOM_SIM=ON -DVR_MPI=OFF -DVR_MPI_REDUCE=OFF -DVR_USE_SWIFT_INTERFACE=ON ..

Config file:

vrconfig_3dfof_subhalos_SO_dmo.txt

Last bit of log:

[1726.081] [debug] search.cxx:3982 Getting Hierarchy 23
[1726.081] [debug] search.cxx:4015 Done
[1726.726] [ info] substructureproperties.cxx:5047 Sort particles and compute properties of 23 objects
[1726.726] [debug] substructureproperties.cxx:5059 Calculate properties using minimum potential particle as reference
[1726.726] [debug] substructureproperties.cxx:5062 Sort particles by binding energy
[1795.700] [debug] substructureproperties.cxx:5087 Memory report at substructureproperties.cxx:5087@long long **SortAccordingtoBindingEnergy(Options &, long long, NBody::Particle *, long long, long long *&, long long *, PropData *, long long): Average: 70.075 [GiB] Data: 72.269 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 81.744 [GiB] Resident: 68.478 [GiB] Shared: 8.734 [MiB] Size: 72.370 [GiB] Text: 4.180 [MiB]
[1795.701] [debug] substructureproperties.cxx:42 Getting CM
[1795.702] [debug] substructureproperties.cxx:320 Done getting CM in 1 [ms]
[1795.702] [debug] substructureproperties.cxx:4621 Getting energy
[1795.703] [debug] substructureproperties.cxx:4733 Have calculated potentials in 744 [us]
[1795.704] [debug] substructureproperties.cxx:5034 Done getting energy in 1 [ms]
[1795.704] [debug] substructureproperties.cxx:338 Getting bulk properties
[1795.706] [debug] substructureproperties.cxx:2194 Done getting properties in 1 [ms]
[1795.706] [debug] substructureproperties.cxx:3219 Done FOF masses in 4 [us]
[1795.706] [debug] substructureproperties.cxx:3236 Get inclusive masses
[1795.706] [debug] substructureproperties.cxx:3237 with masses based on full SO search (slower) for halos only

Line where it segfaults:

rtobar · 2021-11-08T17:03:48Z

@stuartmcalpine your crash actually looks like a different problem, not the one discussed in this GH issue -- but of course that doesn't mean it's not important to get it fixed. Could you open a new issue with the details please so it doesn't get lost? A descriptive title and a link to your comment above should suffice, there's no need to duplicate the whole text/attachments.

For reference, this looks strikingly similar to an issue reported in #78 and fixed in 53c0289. It seems here the same situation is happening: 0 values are being written into a zero-size vector because an extra condition is not checked, but the writing isn't probably needed in the first place. Once the new issue is created in GH I'll have a closer look.

rtobar mentioned this issue Dec 6, 2020

DMO breaks on substructure search #37

Closed

MatthieuSchaller mentioned this issue Feb 14, 2021

Memory leaks when running on-the-fly in SWIFT #60

Closed

rtobar added the bug Something isn't working label Feb 16, 2021

rtobar mentioned this issue May 4, 2021

First batch of fixes for #53 #77

Merged

rtobar mentioned this issue Jun 23, 2021

Buffer overflow in PotentialTree with OpenMP #93

Closed

bwvdnbro mentioned this issue Jul 22, 2021

SO list output too large and possibly wrong #112

Open

stuartmcalpine mentioned this issue Nov 8, 2021

DMO Zoom on-the-fly with SWIFT segfault #116

Closed

rtobar mentioned this issue Nov 10, 2021

Memory usage blowing up in large DMO runs #118

Open

rtobar mentioned this issue Jan 19, 2022

Pull request to get updates from ICRAR master pelahi/VELOCIraptor-STF#108

Closed

Negative densities / other crashes with latest master #53

Negative densities / other crashes with latest master #53

Comments

JBorrow commented Dec 4, 2020

MatthieuSchaller commented Dec 4, 2020

JBorrow commented Dec 4, 2020

rtobar commented Dec 4, 2020

MatthieuSchaller commented Dec 6, 2020

rtobar commented Dec 7, 2020

MatthieuSchaller commented Dec 7, 2020

MatthieuSchaller commented Dec 16, 2020

rtobar commented Dec 22, 2020

rtobar commented Jan 4, 2021

rtobar commented Feb 15, 2021

rtobar commented Feb 19, 2021

rtobar commented Feb 19, 2021 • edited Loading

MatthieuSchaller commented Feb 21, 2021

rtobar commented Feb 26, 2021

MatthieuSchaller commented Feb 26, 2021

JBorrow commented Feb 26, 2021

MatthieuSchaller commented Feb 26, 2021

JBorrow commented Feb 26, 2021

MatthieuSchaller commented Feb 26, 2021

JBorrow commented Feb 26, 2021 via email

MatthieuSchaller commented Feb 26, 2021

JBorrow commented Feb 27, 2021

MatthieuSchaller commented Mar 3, 2021

rtobar commented Mar 3, 2021

MatthieuSchaller commented Mar 3, 2021

JBorrow commented Mar 3, 2021

MatthieuSchaller commented Mar 3, 2021

rtobar commented Mar 16, 2021

MatthieuSchaller commented Mar 16, 2021

JBorrow commented Apr 25, 2021

rtobar commented Apr 30, 2021 • edited Loading

pelahi commented May 3, 2021

rtobar commented May 4, 2021

MatthieuSchaller commented May 4, 2021

JBorrow commented May 4, 2021

rtobar commented May 5, 2021 • edited Loading

JBorrow commented May 31, 2021

james-trayford commented Jun 21, 2021 • edited Loading

MatthieuSchaller commented Jun 22, 2021

james-trayford commented Jun 22, 2021

rtobar commented Jun 23, 2021

pelahi commented Jun 23, 2021

MatthieuSchaller commented Jun 23, 2021

pelahi commented Jun 23, 2021 via email

MatthieuSchaller commented Jun 23, 2021

MatthieuSchaller commented Jun 29, 2021

pelahi commented Jun 30, 2021

stuartmcalpine commented Nov 3, 2021 • edited Loading

rtobar commented Nov 3, 2021

stuartmcalpine commented Nov 8, 2021

rtobar commented Nov 8, 2021

rtobar commented Feb 19, 2021 •

edited

Loading

rtobar commented Apr 30, 2021 •

edited

Loading

rtobar commented May 5, 2021 •

edited

Loading

james-trayford commented Jun 21, 2021 •

edited

Loading

stuartmcalpine commented Nov 3, 2021 •

edited

Loading