Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gctest hang if compiled with TSan (Linux) #181

Closed
ivmai opened this issue Sep 28, 2017 · 8 comments
Closed

gctest hang if compiled with TSan (Linux) #181

ivmai opened this issue Sep 28, 2017 · 8 comments

Comments

@ivmai
Copy link
Owner

ivmai commented Sep 28, 2017

How to reproduce:
./configure --disable-parallel-mark && make -j check CC=clang-4.0 CFLAGS_EXTRA="-fsanitize=thread -DNO_CANCEL_SAFE -DDEBUG_THREADS"

Host: Ubuntu/x64
Source: master branch

The reason: signals are not delivered.
I found a topic about signals delivery in Thread Sanitizer: https://groups.google.com/forum/#!topic/thread-sanitizer/xtSQQQPcIfs

@ivmai ivmai changed the title gctest hang if compiled with TSan gctest hang if compiled with TSan (Linux) Oct 5, 2017
ivmai added a commit that referenced this issue Nov 13, 2017
Issue #181 (bdwgc).

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& THREAD_SANITIZER] (GC_suspend_handler_inner): Use sched_yield()
instead of sigsuspend(&suspend_handler_mask); add TODO item.
ivmai added a commit that referenced this issue Nov 17, 2017
Issue #181 (bdwgc).

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& THREAD_SANITIZER] (GC_suspend_handler_inner): Call
pthread_sigmask(SIG_SETMASK) with an empty set (thus unmask all
signals); add comment.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& THREAD_SANITIZER] (GC_stop_world): Call sem_trywait() repeatedly
(with a delay of 100 microseconds) while getting EAGAIN error (instead
of a sem_wait call).
@ivmai
Copy link
Owner Author

ivmai commented Dec 4, 2017

ivmai added a commit that referenced this issue Dec 12, 2017
(fix commit af409e4)

Issue #181 (bdwgc).

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& THREAD_SANITIZER] (GC_retry_signals): Initialize to TRUE.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(GC_store_stack_ptr): Update comment.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& GC_ENABLE_SUSPEND_THREAD] (GC_suspend_thread): Likewise.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& THREAD_SANITIZER] (GC_stop_world): Do not use sem_trywait+sleep
workaround (not needed if GC_retry_signals).
@ivmai
Copy link
Owner Author

ivmai commented Dec 13, 2017

gctest hang still occurs sometimes:

(gdb) info threads
  Id   Target Id         Frame
  27   Thread 0x7f2bf5eff700 (LWP 16955) "exe" 0x00007f2bf7120d4d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
  26   Thread 0x7f2bf3aff700 (LWP 16969) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  25   Thread 0x7f2bf1afe700 (LWP 16970) "exe" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  24   Thread 0x7f2bef8ff700 (LWP 16971) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  23   Thread 0x7f2bed8fe700 (LWP 16972) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  22   Thread 0x7f2beb8fd700 (LWP 16973) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  21   Thread 0x7f2be98fc700 (LWP 16974) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  20   Thread 0x7f2be78fb700 (LWP 16975) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  19   Thread 0x7f2be58fa700 (LWP 16976) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  18   Thread 0x7f2be36ff700 (LWP 16977) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  17   Thread 0x7f2be16fe700 (LWP 16978) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  16   Thread 0x7f2bdf6fd700 (LWP 16980) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  15   Thread 0x7f2bdd6fc700 (LWP 16981) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  14   Thread 0x7f2bdb6fb700 (LWP 16982) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  13   Thread 0x7f2bd96fa700 (LWP 16983) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  12   Thread 0x7f2bd76f9700 (LWP 16984) "exe" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  11   Thread 0x7f2bd56f8700 (LWP 16986) "exe" 0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
  10   Thread 0x7f2bd36f7700 (LWP 16987) "exe" 0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
  9    Thread 0x7f2bd14ff700 (LWP 16988) "exe" 0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
  8    Thread 0x7f2bcf4fe700 (LWP 16989) "exe" 0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
  7    Thread 0x7f2bcd4fd700 (LWP 16990) "exe" 0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
  6    Thread 0x7f2bcb4fc700 (LWP 16992) "exe" 0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
  5    Thread 0x7f2bc94fb700 (LWP 16993) "exe" 0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
  4    Thread 0x7f2bc74fa700 (LWP 16994) "exe" 0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
  3    Thread 0x7f2bc54f9700 (LWP 16995) "exe" 0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
  2    Thread 0x7f2bc34f8700 (LWP 16996) "exe" 0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
* 1    Thread 0x7f2bf816eb40 (LWP 16909) "exe" 0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
(gdb) t 1
[Switching to thread 1 (Thread 0x7f2bf816eb40 (LWP 16909))]
#0  0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
(gdb) bt
#0  0x000000000049897e in __sanitizer::BlockingMutex::Lock() ()
#1  0x00000000004849f1 in __tsan::AfterSleep(__tsan::ThreadState*, unsigned long) ()
#2  0x000000000044ba07 in nanosleep ()
#3  0x00000000004dcb28 in GC_lock () at pthread_support.c:2090
#4  0x00000000004dc856 in GC_is_thread_tsd_valid (tsd=0x15af6e0 <first_thread+136>) at pthread_support.c:689
#5  0x00000000004e2750 in GC_malloc_kind (bytes=0, knd=0) at thread_local_alloc.c:182
#6  0x00000000004c4d8c in GC_malloc_atomic (lb=9637080) at malloc.c:355
#7  0x00000000004b37a6 in run_one_test () at tests/test.c:1414
#8  0x00000000004b4c0b in main () at tests/test.c:2280
(gdb) t 25
[Switching to thread 25 (Thread 0x7f2bf1afe700 (LWP 16970))]
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f2bf7d603f8 in _L_cond_lock_886 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f2bf7d60164 in __pthread_mutex_cond_lock (mutex=0x15afa48 <mark_mutex>) at ../nptl/pthread_mutex_lock.c:79
#3  0x00007f2bf7d5a494 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:259
#4  0x000000000048d1e8 in __tsan::call_pthread_cancel_with_cleanup(int (*)(void*, void*, void*), void*, void*, void*, void (*)(void*), void*) ()
#5  0x000000000042417c in cond_wait(__tsan::ThreadState*, unsigned long, __tsan::ScopedInterceptor*, int (*)(void*, void*, void*), void*, void*, void*) [clone .constprop.90] ()
#6  0x000000000044f4f4 in pthread_cond_wait ()
#7  0x00000000004e0789 in GC_wait_marker () at pthread_support.c:2249
#8  0x00000000004c9d64 in GC_help_marker (my_mark_no=5) at mark.c:1260
#9  0x00000000004dbac4 in GC_mark_thread (id=<optimized out>) at pthread_support.c:379
#10 0x00000000004216ac in __tsan_thread_start_func ()
#11 0x00007f2bf7d56184 in start_thread (arg=0x7f2bf1afe700) at pthread_create.c:312
#12 0x00007f2bf7159ffd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) t 26
[Switching to thread 26 (Thread 0x7f2bf3aff700 (LWP 16969))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185     ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000048d1e8 in __tsan::call_pthread_cancel_with_cleanup(int (*)(void*, void*, void*), void*, void*, void*, void (*)(void*), void*) ()
#2  0x000000000042417c in cond_wait(__tsan::ThreadState*, unsigned long, __tsan::ScopedInterceptor*, int (*)(void*, void*, void*), void*, void*, void*) [clone .constprop.90] ()
#3  0x000000000044f4f4 in pthread_cond_wait ()
#4  0x00000000004e0789 in GC_wait_marker () at pthread_support.c:2249
#5  0x00000000004c9d64 in GC_help_marker (my_mark_no=6) at mark.c:1260
#6  0x00000000004dbac4 in GC_mark_thread (id=<optimized out>) at pthread_support.c:379
#7  0x00000000004216ac in __tsan_thread_start_func ()
#8  0x00007f2bf7d56184 in start_thread (arg=0x7f2bf3aff700) at pthread_create.c:312
#9  0x00007f2bf7159ffd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

@ivmai
Copy link
Owner Author

ivmai commented Dec 22, 2017

Fixed.

@ivmai ivmai closed this as completed Dec 22, 2017
ivmai added a commit that referenced this issue Dec 28, 2017
(fix commit af409e4)

Issue #181 (bdwgc).

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& THREAD_SANITIZER] (GC_suspend_handler_inner): Call pthread_sigmask()
after last_stop_count update (thus preventing duplicate sem_post() call
in case of GC_suspend_handler_inner is re-entered (if GC_retry_signals);
refine comment.
@ivmai
Copy link
Owner Author

ivmai commented Dec 28, 2017

Still happens (on master). One thread is busy waiting in retry loop of GC_stop_world, the rest threads hang in sigsuspend,

@ivmai ivmai reopened this Dec 28, 2017
@ivmai
Copy link
Owner Author

ivmai commented Jan 26, 2018

Observed on the latest master: https://travis-ci.org/ivmai/bdwgc/jobs/333495385

@ivmai
Copy link
Owner Author

ivmai commented Mar 23, 2018

ivmai added a commit that referenced this issue Mar 29, 2018
…TSan

(fix of commit af409e4)

Issue #181 (bdwgc).

This change is to do as less as possible (even in case of TSan usage)
between the sem_post and sigsuspend calls in GC_suspend_handler_inner
(to match the relevant comment after sigsuspend call).

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& THREAD_SANITIZER] (GC_suspend_handler_inner): Move sigemptyset()
and pthread_sigmask() calls to be just before sem_post() call.
ivmai added a commit that referenced this issue Mar 29, 2018
Issue #181 (bdwgc).

Also, one sem_t variable is used to acknowledge both thread suspends
and restarts.

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(GC_suspend_ack_sem): Add comment.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& GC_NETBSD_THREADS_WORKAROUND] (GC_restart_ack_sem): Remove static
variable.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& GC_NETBSD_THREADS_WORKAROUND] (GC_suspend_handler_inner): Call
sem_post(&GC_suspend_ack_sem) at the end of the handler (just before
RESTORE_CANCEL).
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(suspend_restart_barrier): New static function.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& GC_NETBSD_THREADS_WORKAROUND] (GC_restart_handler): Do not call
sem_post(&GC_restart_ack_sem).
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL] (GC_stop_world):
Remove i, code local variables; call suspend_restart_barrier instead
of sem_wait calls in a loop.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& GC_NETBSD_THREADS_WORKAROUND] (GC_start_world): Likewise.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& GC_NETBSD_THREADS_WORKAROUND] (GC_stop_init): Remove
sem_init(&GC_restart_ack_sem) call.
@ivmai
Copy link
Owner Author

ivmai commented Mar 29, 2018

Also observed with MSan: https://travis-ci.org/ivmai/bdwgc/jobs/359743820

ivmai added a commit that referenced this issue Apr 2, 2018
Issue #181 (bdwgc).

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& THREAD_SANITIZER] (GC_suspend_handler_inner): Replace
pthread_sigmask(SIG_SETMASK) to pthread_sigmask(SIG_UNBLOCK) with the
set with GC_sig_suspend and GC_sig_thr_restart signals.
ivmai added a commit that referenced this issue Apr 2, 2018
(code refactoring)

Issue #181 (bdwgc).

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && GC_ASSERTIONS]
(suspend_restart_barrier): Check that the count of GC_suspend_ack_sem
is zero at the end of the function.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(resend_lost_signals): New static function (the code is moved from
GC_stop_world).
* pthread_stop_world.c [DEBUG_THREADS] (GC_suspend_all): Move the
assignment of GC_stopping_thread and GC_stopping_pid to GC_stop_world.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL] (GC_stop_world):
Call resend_lost_signals() if GC_retry_signals.
* pthread_stop_world.c [!NACL] (GC_restart_all): New static function
(the code is moved from GC_start_world).
* pthread_stop_world.c [!NACL] (GC_start_world): Declare n_live_threads
local variable; call GC_restart_all.
ivmai added a commit that referenced this issue Apr 2, 2018
(fix of commit c2e9583)

Issue #181 (bdwgc).

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(resend_lost_signals): Fix typo ("stopping") in WARN message.
ivmai added a commit that referenced this issue Apr 2, 2018
Issue #181 (bdwgc).

* doc/README.environment (GC_RETRY_SIGNALS, GC_NO_RETRY_SIGNALS):
Update documentation (support of restart signals loss, try signals
if compiled with ASan/MSan/TSan).
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(GC_retry_signals): Set true also if ADDRESS_SANITIZER or
MEMORY_SANITIZER.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& !GC_NETBSD_THREADS_WORKAROUND] (GC_suspend_handler_inner): Call
sem_post(GC_suspend_ack_sem) at the function end if GC_retry_signals;
update comment about the RESTART signal loss.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& !GC_OPENBSD_UTHREADS] (GC_start_world): Call
resend_lost_signals(GC_restart_all) and update n_live_threads value.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL
&& !GC_OPENBSD_UTHREADS && !GC_NETBSD_THREADS_WORKAROUND]
(GC_start_world): Call suspend_restart_barrier() if GC_retry_signals.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL] (GC_stop_init):
Update the message logged if GC_retry_signals.
@ivmai
Copy link
Owner Author

ivmai commented Apr 3, 2018

This issue should be fixed now.

@ivmai ivmai closed this as completed Apr 3, 2018
ivmai added a commit that referenced this issue Apr 4, 2018
(fix of commit 3498427)

Issue #181 (bdwgc).

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL] (GC_stop_count):
Update comment.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(GC_suspend_handler_inner): Add assertion that my_stop_count is even.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(GC_suspend_handler_inner): Mask lowest bit of last_stop_count when
checking for the duplicate signal.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(GC_suspend_handler_inner): If GC_retry_signals then
set the lowest bit of last_stop_count (by AO_store_release) after the
second sem_post() call; add comment.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(GC_suspend_all): Add comment for last_stop_count check.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL] (GC_stop_world):
Increment GC_stop_count by 2 (instead of by one).
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(GC_restart_all): If GC_retry_signals and last_stop_count has the same
value as GC_stop_count+1 then do not increment n_live_threads and do
not send the restart signal to the thread.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant