Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

是否在ollama中测试过速度? #15

Open
allenxml opened this issue Aug 23, 2024 · 1 comment
Open

是否在ollama中测试过速度? #15

allenxml opened this issue Aug 23, 2024 · 1 comment

Comments

@allenxml
Copy link

在ollama中导入hugging face上4bit量化后的gguf格式模型,在openwebui中提问,输出速度很慢。
ollama主机4060ti 16g型号的显卡显存占用才8G,显卡核心频率经常在210,很少到最大频率,7950x的CPU占用率50%。

@kyuumeitai
Copy link

kyuumeitai commented Aug 30, 2024

I will provide a google translation so you don't have to.

Have you tested the speed in Ollama? #15

Import the 4-bit quantized gguf format model on hugging face in Ollama, and ask questions in openwebui. The output speed is very slow.

The video memory of the 4060ti 16g model of the Ollama host only occupies 8G, the core frequency of the graphics card is often at 210, and rarely reaches the maximum frequency. The CPU usage of 7950x is 50%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants