-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama cpp #17457
base: master
Are you sure you want to change the base?
llama cpp #17457
Conversation
3771685
to
ae2057a
Compare
Can we provide a subpackage to support openCL? Is it named |
Preferably single package that enable multiple features |
Packages to enable OpenCL for llama.cpp are: ocl-icd opencl-headers opencl-clhpp CLBlast is required:
Build and install CLBlast:
I dunno if you needed this information, but it would be nice if Termux simply handled the whole process. Thank you. Edit: In case it's needed, here's building instruction: CPU:
GPU(OpenCL):
It's notable that a model loaded from the ~/storage/downloads folder is signifcantly slower compared to loading it from the $HOME path. |
It is an expected behaviour, not a bug.
Related issues: ggerganov/llama.cpp#2292 Compiled deb files: llama-cpp-opencl_0.0.0-r854-fff0e0e-0_aarch64.deb.zip In my mobile, it cannot work, I guess $ LD_LIBRARY_PATH="/system/vendor/lib64" clinfo -l
Platform #0: QUALCOMM Snapdragon(TM)
`-- Device #0: QUALCOMM Adreno(TM)
$ LD_LIBRARY_PATH="/system/vendor/lib64" llama -i -ins --color -t $(nproc) --prompt-cache $PREFIX/tmp/prompt-cache -c 2048 --numa -m ~/ggml-model-q4_0.bin -ngl 1
main: build = 854 (fff0e0e)
main: seed = 1690178858
ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: selecting device: 'QUALCOMM Adreno(TM)'
ggml_opencl: device FP16 support: true
llama.cpp: loading model from /data/data/com.termux/files/home/ggml-model-q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 49954
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required = 5258.03 MB (+ 1026.00 MB per state)
llama_model_load_internal: offloading 1 repeating layers to GPU
llama_model_load_internal: offloaded 1/33 layers to GPU
llama_model_load_internal: total VRAM used: 109 MB
llama_new_context_with_model: kv self size = 1024.00 MB
system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: attempting to load saved session from '/data/data/com.termux/files/usr/tmp/prompt-cache'
main: session file does not exist, will create
main: interactive mode on.
Reverse prompt: '### Instruction:
'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 2
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to LLaMa.
- To return control without starting a new line, end your input with '/'.
- If you want to submit another line, end your input with '\'.
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -54
GGML_ASSERT: /data/data/com.termux/files/home/.termux-build/llama-cpp-opencl/src/ggml-opencl.cpp:1747: false
zsh: abort LD_LIBRARY_PATH="/system/vendor/lib64" llama -i -ins --color -t $(nproc) -c |
This bug is similar as ggerganov/llama.cpp#2341:
Have reported here |
Hi @Freed-Wu, can you test whether this package works fine when you are free? Thanks! |
I figured this must be merged before it's available in Termux. Is there some simple way to try this? |
You can download the The CI artifacts are packed into a |
It's functioning for sure. If it were up to me then I'd suggest changing the way The package builds llama.cpp with It appears Thank you. |
termux-docker x86_64 |
Please rebase this PR to the latest version |
new package: llama-cpp
Close #17453