You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe your problem
Running Refine3D on tomo data returns segmentation fault at the very beginning of the first iteration, this happens both with and without using MPI.
This error is probably related in some way to the dataset, since it doesn't happen with other datasets I have tested, but I cannot find anything wrong in particular in the data itself.
==3609768== Memcheck, a memory error detector
==3609768== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3609768== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==3609768== Command: /apps/spack/latest_x86_64/linux-centos8-x86_64/gcc-8.4.1/relion-5.0-beta-ha23c4r27qo4nzfwxwco6zxjkarbgmtp/bin/relion_refine --nr_parts_sigma2noise 1000 --o Refine3D/test1/run --auto_refine --ios bin2_ribosomes_2D_optimisation_set.star --ref bin2_av_ribo.mrc --firstiter_cc --trust_ref_size --ini_high 40 --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --particle_diameter 380 --flatten_solvent --zero_mask --solvent_mask mask.mrc --solvent_correct_fsc --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 5 --offset_step 2 --sym C1 --low_resol_join_halves 40 --norm --scale --j 7 --gpu --pipeline_control Refine3D/test1/
==3609768==
==3609768== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==3609768== This could cause spurious value errors to appear.
==3609768== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3609768== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==3609768== This could cause spurious value errors to appear.
==3609768== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3609768== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
==3609768== This could cause spurious value errors to appear.
==3609768== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3609768== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
==3609768== This could cause spurious value errors to appear.
==3609768== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3609768== Warning: set address range perms: large range [0x200000000, 0x400200000) (noaccess)
==3609768== Warning: set address range perms: large range [0x1c133000, 0x3c132000) (noaccess)
WARNING: tomogram 20200202_riboprot_a010_1_tomo_016_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_015_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_018_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_022_s1.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_018_s1.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_017_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_013_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_021_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_022_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_013_s1.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_016_s1.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_003_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_017_s1.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_015_s1.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_020_s1.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_014_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_002_s1.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_014_s1.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_003_s1.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_002_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_001_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_020_s2.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_021_s1.tomostar has relion-4 definition of projection matrices; converting them now...
WARNING: tomogram 20200202_riboprot_a010_1_tomo_001_s1.tomostar has relion-4 definition of projection matrices; converting them now...
==3609768== Warning: noted but unhandled ioctl 0x19 with no size/direction hints.
==3609768== This could cause spurious value errors to appear.
==3609768== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3609768== Warning: noted but unhandled ioctl 0x49 with no size/direction hints.
==3609768== This could cause spurious value errors to appear.
==3609768== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3609768== Warning: noted but unhandled ioctl 0x21 with no size/direction hints.
==3609768== This could cause spurious value errors to appear.
==3609768== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3609768== Warning: noted but unhandled ioctl 0x1b with no size/direction hints.
==3609768== This could cause spurious value errors to appear.
==3609768== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3609768== Warning: noted but unhandled ioctl 0x44 with no size/direction hints.
==3609768== This could cause spurious value errors to appear.
==3609768== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3609768== Warning: noted but unhandled ioctl 0x48 with no size/direction hints.
==3609768== This could cause spurious value errors to appear.
==3609768== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3609768== Warning: set address range perms: large range [0x59e99000, 0x6be98000) (noaccess)
==3609768== Warning: set address range perms: large range [0x6a000000, 0xabfff000) (noaccess)
==3609768== Warning: set address range perms: large range [0x6a000000, 0xaa000000) (noaccess)
==3609768== Warning: set address range perms: large range [0x6e000000, 0x7ffff000) (noaccess)
==3609768== Warning: set address range perms: large range [0x400200000, 0xbae1ff000) (noaccess)
==3609768== Warning: set address range perms: large range [0x1052fbc000, 0x180cfbb000) (noaccess)
==3609768== Thread 5:
==3609768== Invalid read of size 8
==3609768== at 0x699A70: void getAllSquaredDifferencesCoarse<MlOptimiserCuda>(unsigned int, OptimisationParamters&, SamplingParameters&, MlOptimiser*, MlOptimiserCuda*, AccPtr<float>&, AccPtrFactory, int) (acc_ml_optimiser_impl.h:1151)
==3609768== by 0x6BB0FD: void accDoExpectationOneParticle<MlOptimiserCuda>(MlOptimiserCuda*, unsigned long, int, AccPtrFactory) (acc_ml_optimiser_impl.h:3838)
==3609768== by 0x68F1E9: MlOptimiserCuda::doThreadExpectationSomeParticles(int) (cuda_ml_optimiser.cu:284)
==3609768== by 0x4EF6CC: globalThreadExpectationSomeParticles(void*, int) (ml_optimiser.cpp:84)
==3609768== by 0x4EF744: MlOptimiser::expectationSomeParticles(long, long) [clone ._omp_fn.0] (ml_optimiser.cpp:4262)
==3609768== by 0x1702F055: ??? (in /cm/local/apps/gcc/10.2.0/lib64/libgomp.so.1.0.0)
==3609768== by 0xF3D3149: start_thread (in /usr/lib64/libpthread-2.28.so)
==3609768== by 0x17567DC2: clone (in /usr/lib64/libc-2.28.so)
==3609768== Address 0x80 is not stack'd, malloc'd or (recently) free'd
==3609768==
==3609768==
==3609768== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==3609768== Access not within mapped region at address 0x80
==3609768== at 0x699A70: void getAllSquaredDifferencesCoarse<MlOptimiserCuda>(unsigned int, OptimisationParamters&, SamplingParameters&, MlOptimiser*, MlOptimiserCuda*, AccPtr<float>&, AccPtrFactory, int) (acc_ml_optimiser_impl.h:1151)
==3609768== by 0x6BB0FD: void accDoExpectationOneParticle<MlOptimiserCuda>(MlOptimiserCuda*, unsigned long, int, AccPtrFactory) (acc_ml_optimiser_impl.h:3838)
==3609768== by 0x68F1E9: MlOptimiserCuda::doThreadExpectationSomeParticles(int) (cuda_ml_optimiser.cu:284)
==3609768== by 0x4EF6CC: globalThreadExpectationSomeParticles(void*, int) (ml_optimiser.cpp:84)
==3609768== by 0x4EF744: MlOptimiser::expectationSomeParticles(long, long) [clone ._omp_fn.0] (ml_optimiser.cpp:4262)
==3609768== by 0x1702F055: ??? (in /cm/local/apps/gcc/10.2.0/lib64/libgomp.so.1.0.0)
==3609768== by 0xF3D3149: start_thread (in /usr/lib64/libpthread-2.28.so)
==3609768== by 0x17567DC2: clone (in /usr/lib64/libc-2.28.so)
==3609768== If you believe this happened as a result of a stack
==3609768== overflow in your program's main thread (unlikely but
==3609768== possible), you can try to increase the size of the
==3609768== main thread stack using the --main-stacksize= flag.
==3609768== The main thread stack size used in this run was 16777216.
==3609768==
==3609768== HEAP SUMMARY:
==3609768== in use at exit: 1,772,650,840 bytes in 223,650 blocks
==3609768== total heap usage: 3,894,011 allocs, 3,670,361 frees, 34,875,110,046 bytes allocated
==3609768==
==3609768== LEAK SUMMARY:
==3609768== definitely lost: 224 bytes in 3 blocks
==3609768== indirectly lost: 0 bytes in 0 blocks
==3609768== possibly lost: 96,400 bytes in 1,140 blocks
==3609768== still reachable: 1,772,554,216 bytes in 222,507 blocks
==3609768== of which reachable via heuristic:
==3609768== stdstring : 4,452 bytes in 28 blocks
==3609768== suppressed: 0 bytes in 0 blocks
==3609768== Rerun with --leak-check=full to see details of leaked memory
==3609768==
==3609768== For lists of detected and suppressed errors, rerun with: -s
==3609768== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
/cm/local/apps/slurm/var/spool/job13688254/slurm_script: line 41: 3609768 Segmentation fault (core dumped) valgrind --track-origins=yes `which relion_refine` --nr_parts_sigma2noise 1000 --o Refine3D/test1/run --auto_refine --ios bin2_ribosomes_2D_optimisation_set.star --ref bin2_av_ribo.mrc --firstiter_cc --trust_ref_size --ini_high 40 --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --particle_diameter 380 --flatten_solvent --zero_mask --solvent_mask mask.mrc --solvent_correct_fsc --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 5 --offset_step 2 --sym C1 --low_resol_join_halves 40 --norm --scale --j 7 --gpu --pipeline_control Refine3D/test1/
Let me know if there is any additional information you need on the job and / or data.
Do you have any suggestions on what could be the issue? Thanks!
The text was updated successfully, but these errors were encountered:
Describe your problem
Running Refine3D on tomo data returns segmentation fault at the very beginning of the first iteration, this happens both with and without using MPI.
This error is probably related in some way to the dataset, since it doesn't happen with other datasets I have tested, but I cannot find anything wrong in particular in the data itself.
Environment:
Dataset:
Job options:
Error message:
This is the error of the job run with valgrind:
Let me know if there is any additional information you need on the job and / or data.
Do you have any suggestions on what could be the issue? Thanks!
The text was updated successfully, but these errors were encountered: