feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache #4329

mudler · 2024-12-05T20:36:31Z

Description

This pull request introduces a new feature to handle cache types in the backend system. The changes span multiple files, adding new fields and parameters to support this functionality.

New cache type support:

backend/backend.proto: Added CacheTypeKey and CacheTypeValue fields to the ModelOptions message.
backend/cpp/llama/grpc-server.cpp: Updated the params_parse function to parse cache_type_k and cache_type_v from the ModelOptions request.
core/backend/options.go: Modified the grpcModelOpts function to include CacheTypeKey and CacheTypeValue in the ModelOptions.
core/config/backend_config.go: Added CacheTypeK and CacheTypeV fields to the LLMConfig struct.

To use it, enable flash_attention and specify a quant cache, like this:

cache_type_v: "q4_0"
flash_attention: true
cache_type_k: "q4_0"

Notes for Reviewers

Signed commits

Yes, I signed my commits.

netlify · 2024-12-05T20:37:39Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`da7a65d`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/67520ed2f8326e00084b2d27
😎 Deploy Preview	https://deploy-preview-4329--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify · 2024-12-05T20:38:30Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`c8bc017`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/6752b2b9d17be90008f9e53f
😎 Deploy Preview	https://deploy-preview-4329--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

… cache Signed-off-by: Ettore Di Giacinto <[email protected]>

… cache (mudler#4329) Signed-off-by: Ettore Di Giacinto <[email protected]>

mudler force-pushed the feat/llama.cpp-quantcache branch from da7a65d to 83133b6 Compare December 5, 2024 20:37

feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv…

9f8b297

… cache Signed-off-by: Ettore Di Giacinto <[email protected]>

mudler force-pushed the feat/llama.cpp-quantcache branch from 83133b6 to 9f8b297 Compare December 6, 2024 08:14

Merge branch 'master' into feat/llama.cpp-quantcache

c8bc017

mudler added the enhancement New feature or request label Dec 6, 2024

mudler merged commit d4c1746 into master Dec 6, 2024
31 checks passed

mudler deleted the feat/llama.cpp-quantcache branch December 6, 2024 09:24

sozercan pushed a commit to sozercan/LocalAI that referenced this pull request Dec 8, 2024

feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv…

d94ff0b

… cache (mudler#4329) Signed-off-by: Ettore Di Giacinto <[email protected]>

BrewTestBot mentioned this pull request Jan 10, 2025

localai 2.25.0 Homebrew/homebrew-core#203887

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache #4329

feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache #4329

mudler commented Dec 5, 2024 •

edited

Loading

netlify bot commented Dec 5, 2024 •

edited

Loading

netlify bot commented Dec 5, 2024 •

edited

Loading

feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache #4329

feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache #4329

Conversation

mudler commented Dec 5, 2024 • edited Loading

netlify bot commented Dec 5, 2024 • edited Loading

✅ Deploy Preview for localai ready!

netlify bot commented Dec 5, 2024 • edited Loading

✅ Deploy Preview for localai ready!

mudler commented Dec 5, 2024 •

edited

Loading

netlify bot commented Dec 5, 2024 •

edited

Loading

netlify bot commented Dec 5, 2024 •

edited

Loading