-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gctest hangs rarely on Linux if compiled with TSan #236
Comments
Reproduced on latest master (a8c5ee4) |
Reproduced on latest master (dd1d0bc) |
Source: master (7aa23f4) Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/564340350 |
Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/565405309 |
Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/565562062 |
Issue #236 (bdwgc). If a multi-threaded process has been forked, then TSan (as of now) cannot reasonably function in the child, e.g. usleep() may hang because some internal lock is not released at fork. The current solution is just to disable signals resend in the child process if there are threads not survived during fork. * include/private/pthread_stop_world.h [CAN_HANDLE_FORK && THREAD_SANITIZER && SIGNAL_BASED_STOP_WORLD] (GC_retry_signals): Declare GC_EXTERN variable. * pthread_stop_world.c [CAN_HANDLE_FORK && THREAD_SANITIZER] (GC_retry_signals): Define as GC_INNER (instead of STATIC). * pthread_stop_world.c (GC_retry_signals): Always initialize to FALSE; move comment to GC_stop_init. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && !NO_RETRY_SIGNALS] (GC_stop_init): Set GC_retry_signals to TRUE. * pthread_support.c [CAN_HANDLE_FORK] (GC_remove_all_threads_but_me): Update comment; change return type to GC_bool; define, set and return removed local variable. * pthread_support.c [CAN_HANDLE_FORK] (fork_child_proc): Define threads_removed local variable and set it to the result of GC_remove_all_threads_but_me(). * pthread_support.c [CAN_HANDLE_FORK && THREAD_SANITIZER && SIGNAL_BASED_STOP_WORLD] (fork_child_proc): If threads_removed or GC_parallel then GC_retry_signals set to FALSE; add comment.
gctest still hangs sometimes (even after commit |
Reproduced even w/o GC_ENABLE_SUSPEND_THREAD - should be fixed by 09dd6a6 |
Related issue #181 |
Issue #236 (bdwgc). As comment between sem_post() and sigsuspend() says GC_sig_thr_restart signal should be masked at that point otherwise there could be a race. Thus, this commit removes pthread_sigmask(SIG_UNBLOCK) call before sem_post() one in GC_suspend_handler_inner. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && THREAD_SANITIZER] (GC_suspend_handler_inner): Remove "set" local variable; do not call sigemptyset(), pthread_sigmask(SIG_UNBLOCK), sigaddset().
Issue #236 (bdwgc). For a reason, usleep() hangs trying to acquire some TSan lock when called resend_lost_signals(). * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && THREAD_SANITIZER] (GC_usleep): Use sched_yield() in a loop instead of usleep() or nanosleep(); update comment.
Latest build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/566687404 |
Issue #236 (bdwgc). Previously select() was used to sleep in the suspend signal handler while the thread is manually suspended. This is changed to use sigsuspend() instead. (But select() is still used for a reason when the thread is self-suspended.) * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL] (GC_stop_count): Update comment. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_handler_inner): Remove calls of GC_store_stack_ptr(), sem_post(), GC_suspend_self_inner(), RESTORE_CANCEL() (dedicated to manual thread suspend). * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD && E2K] (GC_suspend_handler_inner): Remove backing_store_end and backing_store_ptr set and clear dedicated to manual thread suspend. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_handler_inner): Repeat sigsuspend() while suspend_cnt&1 and me->stop_info.ext_suspend_cnt is not updated. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_self_inner): Add DISABLE_CANCEL() and RESTORE_CANCEL(); refine TODO item. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD && DEBUG_THREADS] (GC_suspend_self_inner): Log "suspend self" and "resume self" events. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_thread): Rename saved_stop_count local variable to next_stop_count; add assertion that self thread is not suspended; replace sem_wait() in a loop to suspend_restart_barrier(1); increment GC_stop_count on exit of the function as well (instead of restore). * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_resume_thread): Add assertion that GC_stop_count is odd; call raise_signal(GC_sig_thr_restart) and suspend_restart_barrier(1).
The issue is still reproduced rarely, e.g.:
Unclear how to workaround this. |
Issue #236 (bdwgc). As comment between sem_post() and sigsuspend() says GC_sig_thr_restart signal should be masked at that point otherwise there could be a race. Thus, this commit removes pthread_sigmask(SIG_UNBLOCK) call before sem_post() one in GC_suspend_handler_inner. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && THREAD_SANITIZER] (GC_suspend_handler_inner): Remove "set" local variable; do not call sigemptyset(), pthread_sigmask(SIG_UNBLOCK), sigaddset().
Issue #236 (bdwgc). For a reason, usleep() hangs trying to acquire some TSan lock when called resend_lost_signals(). * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && THREAD_SANITIZER] (GC_usleep): Use sched_yield() in a loop instead of usleep() or nanosleep(); update comment.
(a cherry-pick of commits c207ad8, cae46fb from 'master') Issue #236 (bdwgc). Previously select() was used to sleep in the suspend signal handler while the thread is manually suspended. This is changed to use sigsuspend() instead. (But select() is still used for a reason when the thread is self-suspended.) * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_handler_inner): Remove calls of GC_store_stack_ptr(), sem_post(), GC_suspend_self_inner(), RESTORE_CANCEL() (dedicated to manual thread suspend). * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD && E2K] (GC_suspend_handler_inner): Remove backing_store_end and backing_store_ptr set and clear dedicated to manual thread suspend. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_handler_inner): Do not return quickly on a duplicate signal if suspend_cnt&1; repeat sigsuspend() while suspend_cnt&1 and me->stop_info.ext_suspend_cnt is not updated. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_self_inner): Add DISABLE_CANCEL() and RESTORE_CANCEL(); refine TODO item. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD && DEBUG_THREADS] (GC_suspend_self_inner): Log "suspend self" and "resume self" events. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_thread): Add assertion that self thread is not suspended; replace sem_wait() in a loop to suspend_restart_barrier(1). * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_resume_thread): Call raise_signal(GC_sig_thr_restart) and suspend_restart_barrier(1).
Latest build (hang): https://app.travis-ci.com/github/ivmai/bdwgc/jobs/567979958 |
(a cherry-pick of commit 94eb525 from 'release-8_2') Issue #236 (bdwgc). Previously select() was used to sleep in the suspend signal handler while the thread is manually suspended. This is changed to use sigsuspend() instead. (But select() is still used for a reason when the thread is self-suspended.) * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_handler_inner): Remove calls of GC_store_stack_ptr(), sem_post(), GC_suspend_self_inner(), RESTORE_CANCEL() (dedicated to manual thread suspend). * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_handler_inner): Do not return quickly on a duplicate signal if suspend_cnt&1; repeat sigsuspend() while suspend_cnt&1 and me->stop_info.ext_suspend_cnt is not updated. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_self_inner): Add DISABLE_CANCEL() and RESTORE_CANCEL(); refine TODO item. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD && DEBUG_THREADS] (GC_suspend_self_inner): Log "suspend self" and "resume self" events. * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_suspend_thread): Add assertion that self thread is not suspended; replace sem_wait() in a loop to suspend_restart_barrier(1). * pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ENABLE_SUSPEND_THREAD] (GC_resume_thread): Call raise_signal(GC_sig_thr_restart) and suspend_restart_barrier(1).
Latest build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/573526977 |
Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/576835719 |
Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/588637564 |
Cannot reproduce locally on this config (as of commit 6dfd81e on master): |
Reproduced locally. See stack trace of thread 18: __tsan::MutexUnlock
|
Source: release-8_2 (bef858c) |
Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/609522449 |
Build link: https://travis-ci.org/ivmai/bdwgc/jobs/425865116
Source master (or 8.0.0)
Host: Linux/x64
How to build and run: ./configure --disable-parallel-mark && make check CFLAGS_EXTRA="-fsanitize=thread -D NO_CANCEL_SAFE -D NO_INCREMENTAL -D USE_SPIN_LOCK -fno-omit-frame-pointer -D NTHREADS=15"
The text was updated successfully, but these errors were encountered: