You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to use this library for in-browser web ml inference because with the upcoming CPU support it is better than
ggml.cpp(llama.cpp/whisper.cpp) - as it supports both CPU and GPU and can use GPU on devices where WebGPU is available thereby providing better performance
web-llm(which is WEBGPU only) - as it (will) have a CPU backend thereby allowing inference on devices where WEBGPU is not supported(many android browsers)
onnx - it is ligter than onnx
However, all 3 of them support 4 bit quantization whereas (apparently) ratchet only supports 8 bit quantization. 4-bit quantization is very much required because without that, it is impossible to run whisper-v3-turbo and llama-3.2-1b on browser with limited RAM. So, please support 4bit quantization soon.
The text was updated successfully, but these errors were encountered:
I would like to use this library for in-browser web ml inference because with the upcoming CPU support it is better than
However, all 3 of them support 4 bit quantization whereas (apparently) ratchet only supports 8 bit quantization. 4-bit quantization is very much required because without that, it is impossible to run whisper-v3-turbo and llama-3.2-1b on browser with limited RAM. So, please support 4bit quantization soon.
The text was updated successfully, but these errors were encountered: