Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama cpp #17457

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

llama cpp #17457

wants to merge 1 commit into from

Conversation

Freed-Wu
Copy link
Contributor

@Freed-Wu Freed-Wu commented Jul 19, 2023

new package: llama-cpp

Close #17453

@Freed-Wu Freed-Wu force-pushed the llama-cpp branch 2 times, most recently from 3771685 to ae2057a Compare July 19, 2023 23:03
packages/llama-cpp/build.sh Outdated Show resolved Hide resolved
@Freed-Wu
Copy link
Contributor Author

Can we provide a subpackage to support openCL? Is it named llama-cpp-opencl.subpackage.sh? or create a new llama-cpp-opencl/build.sh?

@truboxl
Copy link
Contributor

truboxl commented Jul 21, 2023

Preferably single package that enable multiple features

@licy183 licy183 dismissed their stale review July 21, 2023 10:49

Has resolved.

@ghost
Copy link

ghost commented Jul 22, 2023

Can we provide a subpackage to support openCL? Is it named llama-cpp-opencl.subpackage.sh? or create a new llama-cpp-opencl/build.sh?

Packages to enable OpenCL for llama.cpp are: ocl-icd opencl-headers opencl-clhpp

CLBlast is required:

$HOME
git clone https://github.com/CNugteren/CLBlast

Build and install CLBlast:

cd CLBlast
cmake -B build \
  -DBUILD_SHARED_LIBS=OFF \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_INSTALL_PREFIX=/data/data/com.termux/files/usr
cd build
make -j8
make install

I dunno if you needed this information, but it would be nice if Termux simply handled the whole process.

Thank you. Edit: In case it's needed, here's building instruction:

CPU:

$HOME
cd llama.cpp
cmake -B build
cd build
cmake --build . --config Release

GPU(OpenCL):

$HOME
cd llama.cpp
cmake -B build -DLLAMA_CLBLAST=ON
cd build
cmake --build . --config Release

It's notable that a model loaded from the ~/storage/downloads folder is signifcantly slower compared to loading it from the $HOME path.

@termux termux deleted a comment from TGSMLM Jul 22, 2023
@Freed-Wu
Copy link
Contributor Author

Freed-Wu commented Jul 24, 2023

but it would be nice if Termux simply handled the whole process.

Related PRs: #17482, #17468

It's notable that a model loaded from the ~/storage/downloads folder is signifcantly slower compared to loading it from the $HOME path.

It is an expected behaviour, not a bug. /sdcard/Downloads (~/storage/download) is in a partition different from /data/data/com.termux/files/usr, so load model from there should be slow.

This version is not valid

Related issues: ggerganov/llama.cpp#2292

Compiled deb files:

llama-cpp-opencl_0.0.0-r854-fff0e0e-0_aarch64.deb.zip
clblast_1.6.1_aarch64.deb.zip
llama-cpp_0.0.0-r854-fff0e0e-0_aarch64.deb.zip

In my mobile, it cannot work, I guess /system/vendor/lib64 has some libraries effect the program?

$ LD_LIBRARY_PATH="/system/vendor/lib64" clinfo -l
Platform #0: QUALCOMM Snapdragon(TM)
`-- Device #0: QUALCOMM Adreno(TM)
$ LD_LIBRARY_PATH="/system/vendor/lib64" llama -i -ins --color -t $(nproc) --prompt-cache $PREFIX/tmp/prompt-cache -c 2048 --numa -m ~/ggml-model-q4_0.bin -ngl 1
main: build = 854 (fff0e0e)
main: seed  = 1690178858
ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: selecting device: 'QUALCOMM Adreno(TM)'
ggml_opencl: device FP16 support: true
llama.cpp: loading model from /data/data/com.termux/files/home/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 49954
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required  = 5258.03 MB (+ 1026.00 MB per state)
llama_model_load_internal: offloading 1 repeating layers to GPU
llama_model_load_internal: offloaded 1/33 layers to GPU
llama_model_load_internal: total VRAM used: 109 MB
llama_new_context_with_model: kv self size  = 1024.00 MB

system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: attempting to load saved session from '/data/data/com.termux/files/usr/tmp/prompt-cache'
main: session file does not exist, will create
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 2


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 CLBlast: OpenCL error: clEnqueueNDRangeKernel: -54
GGML_ASSERT: /data/data/com.termux/files/home/.termux-build/llama-cpp-opencl/src/ggml-opencl.cpp:1747: false
zsh: abort      LD_LIBRARY_PATH="/system/vendor/lib64" llama -i -ins --color -t $(nproc)   -c

@Freed-Wu
Copy link
Contributor Author

Freed-Wu commented Jul 24, 2023

In my mobile, it cannot work

This bug is similar as ggerganov/llama.cpp#2341:

GGML_ASSERT: /build/wnslnw6pk8d4c8k0b8w4w4qz45wgy9hw-source/ggml-opencl.cpp:1524: false

Have reported here

@licy183
Copy link
Member

licy183 commented Aug 28, 2023

Hi @Freed-Wu, can you test whether this package works fine when you are free? Thanks!

@ghost
Copy link

ghost commented Aug 28, 2023

I figured this must be merged before it's available in Termux. Is there some simple way to try this?

@TomJo2000
Copy link
Member

TomJo2000 commented Aug 28, 2023

I figured this must be merged before it's available in Termux. Is there some simple way to try this?

You can download the .deb corresponding with your CPU architecture from the CI run and install it locally using apt install /path/to/file.deb

The CI artifacts are packed into a .tar.zip compressed archive by GitHub Actions, so you will need to unzip and untar it first.

@ghost
Copy link

ghost commented Aug 28, 2023

It's functioning for sure. If it were up to me then I'd suggest changing the way llama.cpp is built, or maybe add variation on build method, for example:

The package builds llama.cpp with OpenBlas, which is fine, it works, but I've noticed building without it has higher performance. There's a case where OpenBlas is fastest for absorbing large amounts of text, but I think it's an edge-case. For my device, Samsung s10+, make is actually the fastest compared to OpenBlas, or even CLBlast currently. Maybe there's cmake optimizations that I'm missing, either way, OpenBlas lowers performance.

It appears llama behaves as main in llama.cpp, which is neat. It worked in my test, and other features are available, perplexity, quantize, ect. But, how to use server?

Thank you.

@truboxl truboxl self-requested a review August 28, 2023 16:17
@truboxl
Copy link
Contributor

truboxl commented Aug 28, 2023

~ $ llama
main: build = 0 (unknown)
main: seed  = 1693239998
libc: Fatal signal 4 (SIGILL), code 2 (ILL_ILLOPN), fault addr 0x7f21f77763d3 in tid 89 (llama), pid 89 (llama)
Illegal instruction
~ $ llama-bench
libc: Fatal signal 4 (SIGILL), code 2 (ILL_ILLOPN), fault addr 0x7fa8e40563d3 in tid 115 (llama-bench), pid 115 (llama-bench)
Illegal instruction
~ $ llama-server
libc: Fatal signal 4 (SIGILL), code 2 (ILL_ILLOPN), fault addr 0x7f775e04e3d3 in tid 118 (llama-server), pid 118 (llama-server)
Illegal instruction

termux-docker x86_64

@ghost
Copy link

ghost commented Aug 28, 2023

Still working for me. The speed decrease for OpenBlas is significant: almost a full token/per second. Perhaps it should be a seperate build, like CLBlast.

server example run llama-server -m ~/Vicuna-7b.Q4_0.gguf -c 2048 -t 3 -b 7:
Screenshot_20230828_182227
How can he 7 sentence?! (kidding)

@truboxl
Copy link
Contributor

truboxl commented Sep 22, 2023

Please rebase this PR to the latest version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Package]: llama-cpp
4 participants