Releases: gtygo/llama.cpp
Releases · gtygo/llama.cpp
b4150
cuda : optimize argmax (#10441) * cuda : optimize argmax * remove unused parameter ggml-ci * fixup : use full warps ggml-ci * Apply suggestions from code review Co-authored-by: Johannes Gäßler <[email protected]> * fix ub * ggml : check ne00 <= INT32_MAX in argmax and argsort --------- Co-authored-by: Johannes Gäßler <[email protected]>
b3562
Reuse querybatch to reduce frequent memory allocation
b3561
retrieval
b3560
llama : add support for lora adapters in T5 model (#8938) Co-authored-by: Stanisław Szymczyk <[email protected]>
b3559
make : fix llava obj file race (#8946) ggml-ci
b3556
sync : ggml
b3538
llama-bench : add support for getting cpu info on Windows (#8824) * Add support for getting cpu info on Windows for llama_bench * refactor --------- Co-authored-by: slaren <[email protected]>