-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault #1942
Comments
More debugging information is here.
|
Reported by @MatthewKhouzam |
Solution 1Simply resolve the issue, removing diff --git a/libmcount/plthook.c b/libmcount/plthook.c
index 8c8ba7ea..82b95b2c 100644
--- a/libmcount/plthook.c
+++ b/libmcount/plthook.c
@@ -452,7 +452,7 @@ static const char *except_syms[] = {
static const char *resolve_syms[] = {
"execl", "execlp", "execle", "execv", "execve", "execvp",
- "execvpe", "fexecve", "posix_spawn", "posix_spawnp", "pthread_exit",
+ "execvpe", "fexecve", "posix_spawn", "posix_spawnp",
}; Solution 2This solution makes not reference the rstack value of the mtdp. diff --git a/libmcount/wrap.c b/libmcount/wrap.c
index c52e91a1..c119b4cc 100644
--- a/libmcount/wrap.c
+++ b/libmcount/wrap.c
@@ -504,23 +504,21 @@ __visible_default void *dlopen(const char *filename, int flags)
__visible_default __noreturn void pthread_exit(void *retval)
{
struct mcount_thread_data *mtdp;
- struct mcount_ret_stack *rstack;
if (unlikely(real_pthread_exit == NULL))
mcount_hook_functions();
mtdp = get_thread_data();
- if (!mcount_estimate_return && !check_thread_data(mtdp)) {
- rstack = &mtdp->rstack[mtdp->idx - 1];
- /* record the final call */
- mcount_exit_filter_record(mtdp, rstack, NULL);
+ if (!check_thread_data(mtdp)) {
+ pr_dbg2("%s: exception resumed on [%d]\n", __func__, mtdp->idx);
+
+ mtdp->in_exception = true;
/*
- * it won't return to the caller ("noreturn"),
- * do not try to restore the address..
+ * restore return addresses so that it can unwind stack
+ * frames safely during the exception handling.
+ * It pairs to mcount_rstack_reset_exception().
*/
- mtdp->idx--;
-
mcount_rstack_restore(mtdp);
} @MichelleJin12 and I have been investigating the root cause further. We are now checking the trampolined function This is result from present state.
|
Before patch Compiler gcc clang
Runtime test case pg finstrument-fu fpatchable-fun pg finstrument-fu fpatchable-fun
------------------------: O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os
047 signal2 : OK OK OK OK OK OK OK NG OK OK OK OK NG OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
124 exception : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK
141 recv_basic : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK NZ OK OK OK OK OK OK
182 thread_exit : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK SG OK OK OK OK
183 info_quote : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
184 arg_enum : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
185 exception2 : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK
186 exception3 : OK OK OK OK OK OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK NG NG NG NG NG NG NG NG NG NG
204 arg_dwarf4 : OK OK NG NG OK SK SK SK SK SK OK OK NG NG OK OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK
220 trace_script : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG
226 default_opts : OK OK OK OK NG NG OK OK OK OK OK NG NG OK OK NG NG OK OK OK OK OK OK OK NG NG OK OK NG NG
251 exception4 : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK NZ NZ NZ NZ NZ OK OK OK OK OK
255 arg_dwarf6 : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK NG NG NG NG NG SK SK SK SK SK NG NG NG NG NG
273 agent_basic : NZ OK OK NZ OK NZ OK OK OK OK OK OK OK OK OK OK OK OK OK NZ OK OK OK OK OK OK OK OK OK OK
281 agent_trace_toggle : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK NZ OK NZ OK NZ OK OK OK OK OK OK OK OK OK
282 agent_depth : OK OK OK OK OK NZ OK NZ OK NZ OK NZ OK OK OK OK OK OK OK OK OK OK OK OK NZ OK OK OK OK OK
283 agent_time : NG NG NZ NG NG NG NG NG NG NG OK NG NG NG NG OK NG NG NG OK NG NG NG NG NG OK OK NG OK NG
284 agent_filter : OK OK NZ OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
285 agent_caller_filter : OK NZ OK OK OK OK OK NZ OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
286 agent_trigger : NG NG NG NG NG NZ OK NG NG NG NG NG OK NG NG OK OK OK OK NG NG OK NG NG OK OK OK NG NG NG
287 arg_enum3 : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK
288 arg_oct : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
289 exception5 : OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK
runtime test stats
====================
total 8700 Tests executed (success: 82.51%)
OK: 7048 Test succeeded
OK: 130 Test succeeded (with some fixup)
NG: 133 Different test result
NZ: 23 Non-zero return value
SG: 1 Abnormal exit by signal
TM: 0 Test ran too long
BI: 60 Build failed
LA: 0 Unsupported Language
SK: 1305 Skipped After patch2 045 report_avg_self : OK OK OK OK OK OK OK OK OK OK OK OK NG OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
047 signal2 : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK NG OK OK OK OK OK OK OK OK OK OK NG
110 replay_time_T : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK NG OK OK OK OK OK OK OK OK OK OK OK OK OK
124 exception : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK
141 recv_basic : OK OK OK OK OK OK OK OK OK OK OK OK NZ OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
182 thread_exit : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG
185 exception2 : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK
186 exception3 : OK OK OK OK OK OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK NG NG NG NG NG NG NG NG NG NG
204 arg_dwarf4 : OK OK NG NG OK SK SK SK SK SK OK OK NG NG OK OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK
220 trace_script : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG
226 default_opts : OK OK OK OK NG NG OK OK OK OK NG OK OK OK OK OK OK NG NG OK OK OK OK NG NG OK OK OK OK OK
238 report_field2 : OK OK OK OK OK OK OK OK OK NG OK OK OK NG OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
251 exception4 : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK NZ NZ NZ NZ NZ OK OK OK OK OK
255 arg_dwarf6 : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK NG NG NG NG NG SK SK SK SK SK NG NG NG NG NG
273 agent_basic : OK OK OK OK OK OK OK OK OK NZ OK NZ OK NZ OK NZ OK OK OK NZ OK OK NZ OK OK OK OK OK OK OK
281 agent_trace_toggle : OK OK OK OK OK OK OK OK OK OK OK OK OK OK NZ OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
282 agent_depth : OK OK OK OK OK OK OK OK OK OK OK NZ OK OK OK NZ OK OK OK OK OK OK OK OK OK OK OK OK OK OK
283 agent_time : NG NZ OK OK OK NG NG NG NG NG NG NG NG OK NZ NG NG NG NG NG NG NG OK NG NG NG NG NG NG NG
285 agent_caller_filter : OK OK OK OK OK OK OK OK OK OK OK NZ OK OK NZ OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
286 agent_trigger : OK NG NG OK OK NG OK NG NG OK NG OK NG NG OK NG OK OK OK NG NG NG OK OK OK OK OK NG NG NG
289 exception5 : OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK
runtime test stats
====================
total 8700 Tests executed (success: 82.24%)
OK: 7025 Test succeeded
OK: 130 Test succeeded (with some fixup)
NG: 160 Different test result
NZ: 20 Non-zero return value
SG: 0 Abnormal exit by signal
TM: 0 Test ran too long
BI: 60 Build failed
LA: 0 Unsupported Language
SK: 1305 Skipped
182 thread_exit : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG All fails.. We need more investigation. |
If remove pthread_exit, can't trace pthread_exit. $ cd tests
$ gcc -pg s-thread-exit.c -pthread
$ uftrace a.out Answer # DURATION TID FUNCTION
[26832] | main() {
[26832] | pthread_create() {
51.697 us [26832] | } /* pthread_create */
[26832] | pthread_create() {
32.395 us [26832] | } /* pthread_create */
[26832] | pthread_join() {
[26836] | thread_main() {
[26836] | printf() {
17.092 us [26836] | } /* printf */
[26836] | pthread_exit() {
[26837] | thread_main() {
[26837] | printf() {
5.480 us [26837] | } /* printf */
[26837] | pthread_exit() {
362.442 us [26832] | } /* pthread_join */
[26832] | pthread_join() {
1.000 us [26832] | } /* pthread_join */
457.662 us [26832] | } /* main */ After patch2 result $ uftrace ./a.out
1.000000
1.000000
# DURATION TID FUNCTION
0.597 us [ 241052] | __monstartup();
0.224 us [ 241052] | __cxa_atexit();
[ 241052] | main() {
61.820 us [ 241052] | pthread_create();
62.611 us [ 241052] | pthread_create();
[ 241052] | pthread_join() {
[ 241055] | thread_main() {
67.348 us [ 241055] | printf();
[ 241056] | thread_main() {
36.510 us [ 241056] | printf();
1.011 ms [ 241052] | } /* pthread_join */
0.268 us [ 241052] | pthread_join();
1.138 ms [ 241052] | } /* main */
uftrace stopped tracing with remaining functions
================================================
task: 241055
[0] thread_main
task: 241056
[0] thread_main |
This issue appears to be an x86 only issue at this time. Below is the result without patched on arm64 laptop. $ uname -a
Linux fedora 6.10.0-0.rc7.58.fc41.aarch64 #1 SMP PREEMPT_DYNAMIC Tue Jul 9 05:11:42 KST 2024 aarch64 GNU/Linux
$ uftrace --version
uftrace v0.16-11-g804a ( aarch64 dwarf python3 luajit tui perf sched dynamic kernel )
$ uftrace record sum_of_squares-O0-pg
Time for 1 workers (wall time, total workers time): 3.779770 / 3.779519
Time for 2 workers (wall time, total workers time): 1.955173 / 3.907919
Time for 3 workers (wall time, total workers time): 1.342060 / 4.022616
Time for 4 workers (wall time, total workers time): 1.210077 / 4.776442
Time for 5 workers (wall time, total workers time): 1.235631 / 6.080350
Time for 6 workers (wall time, total workers time): 1.363751 / 6.876253
Time for 7 workers (wall time, total workers time): 1.272646 / 7.838870
Time for 8 workers (wall time, total workers time): 1.256382 / 9.250987
Time for 9 workers (wall time, total workers time): 1.282801 / 10.685252
Time for 10 workers (wall time, total workers time): 1.296283 / 12.057002
Time for 11 workers (wall time, total workers time): 1.272074 / 12.737434
Time for 12 workers (wall time, total workers time): 1.270198 / 14.185802
Time for 13 workers (wall time, total workers time): 1.280878 / 15.667125
Time for 14 workers (wall time, total workers time): 1.296344 / 16.875531
Time for 15 workers (wall time, total workers time): 1.293961 / 18.002395
Time for 16 workers (wall time, total workers time): 1.284461 / 19.564768
Total work done (sum of squares from 100 to 5000000000): 470444716059854368
Total time taken: 23.6925 seconds
Cumulative time for all workers: 166.3083 seconds
|
I wonder if there was a change about the exit path in libpthread recently. |
Thank you @namhyung, based on c7e105, I'll check
|
The segfault can be reproduced as follows.
The text was updated successfully, but these errors were encountered: