Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache #4329

Merged
merged 2 commits into from
Dec 6, 2024

Conversation

mudler
Copy link
Owner

@mudler mudler commented Dec 5, 2024

Description

This pull request introduces a new feature to handle cache types in the backend system. The changes span multiple files, adding new fields and parameters to support this functionality.

New cache type support:

To use it, enable flash_attention and specify a quant cache, like this:

cache_type_v: "q4_0"
flash_attention: true
cache_type_k: "q4_0"

Notes for Reviewers

Signed commits

  • Yes, I signed my commits.

@mudler mudler force-pushed the feat/llama.cpp-quantcache branch from da7a65d to 83133b6 Compare December 5, 2024 20:37
Copy link

netlify bot commented Dec 5, 2024

Deploy Preview for localai ready!

Name Link
🔨 Latest commit da7a65d
🔍 Latest deploy log https://app.netlify.com/sites/localai/deploys/67520ed2f8326e00084b2d27
😎 Deploy Preview https://deploy-preview-4329--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Dec 5, 2024

Deploy Preview for localai ready!

Name Link
🔨 Latest commit c8bc017
🔍 Latest deploy log https://app.netlify.com/sites/localai/deploys/6752b2b9d17be90008f9e53f
😎 Deploy Preview https://deploy-preview-4329--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mudler mudler force-pushed the feat/llama.cpp-quantcache branch from 83133b6 to 9f8b297 Compare December 6, 2024 08:14
@mudler mudler added the enhancement New feature or request label Dec 6, 2024
@mudler mudler merged commit d4c1746 into master Dec 6, 2024
31 checks passed
@mudler mudler deleted the feat/llama.cpp-quantcache branch December 6, 2024 09:24
sozercan pushed a commit to sozercan/LocalAI that referenced this pull request Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant