Skip to content

Releases: gtygo/llama.cpp

b4150

22 Nov 05:24
a5e4759
Compare
Choose a tag to compare
cuda : optimize argmax (#10441)

* cuda : optimize argmax

* remove unused parameter

ggml-ci

* fixup : use full warps

ggml-ci

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <[email protected]>

* fix ub

* ggml : check ne00 <= INT32_MAX in argmax and argsort

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b3562

09 Aug 19:54
Compare
Choose a tag to compare
Reuse querybatch to reduce frequent memory allocation

b3561

09 Aug 19:14
Compare
Choose a tag to compare
retrieval

b3560

09 Aug 19:13
6afd1a9
Compare
Choose a tag to compare
llama : add support for lora adapters in T5 model (#8938)

Co-authored-by: Stanisław Szymczyk <[email protected]>

b3559

09 Aug 17:30
272e3bd
Compare
Choose a tag to compare
make : fix llava obj file race (#8946)

ggml-ci

b3556

09 Aug 09:52
4305b57
Compare
Choose a tag to compare
sync : ggml

b3538

07 Aug 06:48
506122d
Compare
Choose a tag to compare
llama-bench : add support for getting cpu info on Windows (#8824)

* Add support for getting cpu info on Windows for llama_bench

* refactor

---------

Co-authored-by: slaren <[email protected]>