Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARN: Segmentation fault: address not mapped in llama.cpp #1949

Open
honggyukim opened this issue Aug 18, 2024 · 12 comments
Open

WARN: Segmentation fault: address not mapped in llama.cpp #1949

honggyukim opened this issue Aug 18, 2024 · 12 comments

Comments

@honggyukim
Copy link
Collaborator

The llama.cpp project already has an option to add -pg option with LLAMA_GPROF=1.

But it gets crashed when llama-cli is traced with uftrace as follows.

$ git clone https://github.com/ggerganov/llama.cpp.git && cd llama.cpp

$ git checkout 2339a0be1c8e31fcf4531427183b94f2ef019e56

$ make LLAMA_GPROF=1 -j

$ uftrace record --no-libcall ./llama-cli 
Log start
main: build = 3602 (82d8e7f3)
main: built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
main: seed  = 1723990922
WARN: Segmentation fault: address not mapped (addr: 0x20)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.16-12-g8382 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [8] (llama_load_model_from_file[5a2eea7cfda4] <= llama_init_from_gpt_params[5a2eea86b78e])
WARN: [7] (llama_init_from_gpt_params[5a2eea86b700] <= main[5a2eea6d4392])
WARN: [6] (log_dump_cmdline_impl[5a2eea8d012d] <= main[5a2eea6d3e46])
WARN: [5] (main[5a2eea6d3bec] <= __libc_start_call_main[7aad7802a1ca])
WARN: [4] (__static_initialization_and_destruction_0[5a2eea6d28a9] <= mcount_return[7aad788657ef])
WARN: [3] (_GLOBAL__sub_I_json_schema_to_grammar.cpp[5a2eea6d3bae] <= __libc_start_main[7aad7802a304])
WARN: [2] (_GLOBAL__sub_I_console.cpp[5a2eea6d268e] <= __libc_start_main[7aad7802a304])
WARN: [1] (__static_initialization_and_destruction_0[5a2eea6d2402] <= mcount_return[7aad788657ef])
WARN: [0] (_GLOBAL__sub_I_unicode_data.cpp[5a2eea6d266e] <= __libc_start_main[7aad7802a304])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault

It still gets crashed even with -e option.

$ uftrace record --no-libcall -e ./llama-cli 
Log start
main: build = 3602 (82d8e7f3)
main: built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
main: seed  = 1723990966
WARN: Segmentation fault: address not mapped (addr: 0x20)
WARN: Backtrace from uftrace v0.16-12-g8382 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [6] (llama_load_model_from_file[5bc0d1b9ada4] <= llama_init_from_gpt_params[5bc0d1c3678e])
WARN: [5] (llama_model_default_params[5bc0d1b41bae] <= llama_model_params_from_gpt_params[5bc0d1c306af])
WARN: [4] (llama_model_params_from_gpt_params[5bc0d1c30695] <= llama_init_from_gpt_params[5bc0d1c36737])
WARN: [3] (llama_init_from_gpt_params[5bc0d1c36700] <= main[5bc0d1a9f392])
WARN: [2] (main[5bc0d1a9ebec] <= __libc_start_call_main[70fd6b82a1ca])
WARN: [1] (__static_initialization_and_destruction_0[5bc0d1a9d8a9] <= __libc_start_main[70fd6b82a304])
WARN: [0] (_GLOBAL__sub_I_json_schema_to_grammar.cpp[5bc0d1a9ebae] <= __libc_start_main[70fd6b82a304])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault
@honggyukim
Copy link
Collaborator Author

Here is the backtrace.

$ gdb -q --args uftrace record --no-libcall -e ./llama-cli
Reading symbols from uftrace...
(gdb) r

Log start
main: build = 3602 (82d8e7f3)
main: built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
main: seed  = 1723991141

Thread 2.1 "llama-cli" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7e868c0 (LWP 27737)]
0x000055555568bdc2 in llama_load_model_from_file (path_model=0x5555558c06e0 "models/7B/ggml-model-f16.gguf", params=...) at src/llama.cpp:16936
16936           struct llama_model_params   params) {
(gdb) bt
#0  0x000055555568bdc2 in llama_load_model_from_file (path_model=0x5555558c06e0 "models/7B/ggml-model-f16.gguf", params=...) at src/llama.cpp:16936
#1  0x000055555572778e in llama_init_from_gpt_params (params=...) at common/common.cpp:2107
#2  0x0000555555590392 in main (argc=<optimized out>, argv=<optimized out>) at examples/main/main.cpp:210

@namhyung
Copy link
Owner

Looks like a memory corruption on the stack. I've added a debug message and the problem was gone. :(

@namhyung
Copy link
Owner

It seems the llama_load_model_from_file passes a large struct llama_model_param by value and it's overwritten somewhere..

@namhyung
Copy link
Owner

I think I found the reason.. Strangely, it seems the llama binary didn't followed the x86_64 calling convention and used r10 register across a function call. Here's the disassembly of the llama_load_model_from_file:

(gdb) disas
Dump of assembler code for function llama_load_model_from_file(char const*, llama_model_params):
Address range 0x555555686f80 to 0x555555688d7a:
   0x0000555555686f80 <+0>:	push   %rbp
   0x0000555555686f81 <+1>:	mov    %rsp,%rbp
   0x0000555555686f84 <+4>:	push   %r15
   0x0000555555686f86 <+6>:	push   %r14
   0x0000555555686f88 <+8>:	push   %r13
   0x0000555555686f8a <+10>:	push   %r12
   0x0000555555686f8c <+12>:	push   %r10
   0x0000555555686f8e <+14>:	lea    0x10(%rbp),%r10
   0x0000555555686f92 <+18>:	push   %rbx
   0x0000555555686f93 <+19>:	sub    $0x1e0,%rsp
   0x0000555555686f9a <+26>:	call   *0x16dd28(%rip)        # 0x5555557f4cc8   (mcount)
   0x0000555555686fa0 <+32>:	mov    0x20(%r10),%r15
   0x0000555555686fa4 <+36>:	mov    0x28(%r10),%rax
   0x0000555555686fa8 <+40>:	mov    %r10,-0x1e0(%rbp)
   0x0000555555686faf <+47>:	mov    %rdi,-0x1d8(%rbp)
   0x0000555555686fb6 <+54>:	mov    %rax,-0x208(%rbp)
   0x0000555555686fbd <+61>:	mov    %r15,-0x200(%rbp)
   0x0000555555686fc4 <+68>:	call   0x5555555ad010 <ggml_time_init>
  ...

The r10 register was set at offset 14, but used after calling mcount at offset 32 and 36. The value of r10 was overwritten during mcount because it's a temporary register. I think compiler should generate code to save r10 before offset 26. Probably this is a bug in the compiler but we cannot simply blame it and done with it.

The following patch seems to work well for -e at least. And I found I did the same for the dynamic tracing already. :)

diff --git a/arch/x86_64/mcount.S b/arch/x86_64/mcount.S
index dd32b0a6..8f0f07e0 100644
--- a/arch/x86_64/mcount.S
+++ b/arch/x86_64/mcount.S
@@ -57,9 +57,13 @@ GLOBAL(mcount)
 
        /* save rax (implicit argument for variadic functions) */
        push %rax
+       push %r10
+       push %r11
 
        call mcount_entry
 
+       pop  %r11
+       pop  %r10
        pop  %rax
 
        /* restore original stack pointer */

@honggyukim
Copy link
Collaborator Author

Thanks for the investigation.

I see it works fine with -e, but still gets crashed even with your fix about r10 and r11 when -e is unused.

$ uftrace record --no-libcall ./llama-cli
Log start
main: build = 3601 (2339a0be)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1724667048
gguf_init_from_file: failed to open 'models/7B/ggml-model-f16.gguf': 'No such file or directory'
WARN: process crashed by signal 11: Segmentation fault (si_code: 128)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.16-15-gf8a7 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [8] (gguf_init_from_file[56071b6ece4d] <= llama_model_loader::llama_model_loader[56071b7cf28d])
WARN: [7] (llama_model_loader::llama_model_loader[56071b7cee08] <= llama_load_model_from_file[56071b794731])
WARN: [6] (llama_load_model_from_file[56071b794354] <= llama_init_from_gpt_params[56071b829c2c])
WARN: [5] (llama_init_from_gpt_params[56071b829b90] <= main[56071b689904])
WARN: [4] (log_dump_cmdline_impl[56071b88c43d] <= main[56071b6893a5])
WARN: [3] (main[56071b68910a] <= __libc_start_call_main[7fc4d2a29d90])
WARN: [2] (_GLOBAL__sub_I_common.cpp[56071b687fd4] <= __libc_start_main[7fc4d2a29ebb])
WARN: [1] (__static_initialization_and_destruction_0[56071b687d32] <= mcount_return[7fc4d31b6007])
WARN: [0] (_GLOBAL__sub_I_unicode_data.cpp[56071b687fae] <= __libc_start_main[7fc4d2a29ebb])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault 

@namhyung
Copy link
Owner

How about this too?

 ENTRY(mcount_return)
        .cfi_startproc
-       sub $48, %rsp
-       .cfi_def_cfa_offset 48
-
-       movq   %rdi,  32(%rsp)
+       sub $96, %rsp
+       .cfi_def_cfa_offset 96
+
+       /* save all caller-saved registers due to -fipa-ra */
+       movq   %r11,  80(%rsp)
+       movq   %r10,  72(%rsp)
+       movq   %r9,   64(%rsp)
+       movq   %r8,   56(%rsp)
+       movq   %rdi,  48(%rsp)
+       movq   %rsi,  40(%rsp)
+       movq   %rcx,  32(%rsp)
+
+       /* below are used to carry return value */
        movdqu %xmm0, 16(%rsp)
        movq   %rdx,   8(%rsp)
        movq   %rax,   0(%rsp)
@@ -113,14 +129,21 @@ ENTRY(mcount_return)
        movq    0(%rsp), %rsp
 
        /* restore original return address in parent */
-       movq    %rax, 40(%rsp)
+       movq    %rax, 88(%rsp)
 
        movq    0(%rsp), %rax
        movq    8(%rsp), %rdx
        movdqu 16(%rsp), %xmm0
-       movq   32(%rsp), %rdi
 
-       add $40, %rsp
+       movq   32(%rsp), %rcx
+       movq   40(%rsp), %rsi
+       movq   48(%rsp), %rdi
+       movq   56(%rsp), %r8
+       movq   64(%rsp), %r9
+       movq   72(%rsp), %r10
+       movq   80(%rsp), %r11
+
+       add    $88, %rsp
        .cfi_def_cfa_offset 8
        retq
        .cfi_endproc

@honggyukim
Copy link
Collaborator Author

Thanks but still not working.

$ uftrace record --no-libcall ./llama-cli
Log start
main: build = 3601 (2339a0be)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1724738400
WARN: Segmentation fault: address not mapped (addr: 0x20)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.16-17-g06b3a13 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [6] (llama_load_model_from_file[55fbf4d99354] <= llama_init_from_gpt_params[55fbf4e2ec2c])
WARN: [5] (llama_init_from_gpt_params[55fbf4e2eb90] <= main[55fbf4c8e904])
WARN: [4] (log_dump_cmdline_impl[55fbf4e9143d] <= main[55fbf4c8e3a5])
WARN: [3] (main[55fbf4c8e10a] <= __libc_start_call_main[7fe7fca29d90])
WARN: [2] (_GLOBAL__sub_I_common.cpp[55fbf4c8cfd4] <= __libc_start_main[7fe7fca29ebb])
WARN: [1] (__static_initialization_and_destruction_0[55fbf4c8cd32] <= mcount_return[7fe7fd1cafff])
WARN: [0] (_GLOBAL__sub_I_unicode_data.cpp[55fbf4c8cfae] <= __libc_start_main[7fe7fca29ebb])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault

@namhyung
Copy link
Owner

For some reason, I cannot reproduce it within gdb.

@honggyukim
Copy link
Collaborator Author

That's weird. Do you mean you cannot reproduce it only when gdb is used?

@namhyung
Copy link
Owner

namhyung commented Sep 1, 2024

Yep, I can reproduce it without gdb. Anyway, I think we can merge what we found so far.

namhyung added a commit that referenced this issue Sep 2, 2024
A couple of fixes and update during investigation of #1949.

Signed-off-by: Namhyung Kim <[email protected]>
@honggyukim
Copy link
Collaborator Author

Hmm.. the second change at #1949 (comment) doesn't fix anything. It fails even with and without -e option while the first change makes it work with -e at least.

@namhyung
Copy link
Owner

namhyung commented Sep 2, 2024

Yeah but I think it should be added together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants