是否在ollama中测试过速度？ #15

allenxml · 2024-08-23T16:30:46Z

在ollama中导入hugging face上4bit量化后的gguf格式模型，在openwebui中提问，输出速度很慢。
ollama主机4060ti 16g型号的显卡显存占用才8G，显卡核心频率经常在210，很少到最大频率，7950x的CPU占用率50％。

kyuumeitai · 2024-08-30T13:59:18Z

I will provide a google translation so you don't have to.

Have you tested the speed in Ollama? #15

Import the 4-bit quantized gguf format model on hugging face in Ollama, and ask questions in openwebui. The output speed is very slow.

The video memory of the 4060ti 16g model of the Ollama host only occupies 8G, the core frequency of the graphics card is often at 210, and rarely reaches the maximum frequency. The CPU usage of 7950x is 50%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

是否在ollama中测试过速度？ #15

是否在ollama中测试过速度？ #15

allenxml commented Aug 23, 2024

kyuumeitai commented Aug 30, 2024 •

edited

Loading

是否在ollama中测试过速度？ #15

是否在ollama中测试过速度？ #15

Comments

allenxml commented Aug 23, 2024

kyuumeitai commented Aug 30, 2024 • edited Loading

kyuumeitai commented Aug 30, 2024 •

edited

Loading