You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the AWQ quantization model requires the NV GPU, we can't run it on CPU. I want to convert the model from pytorch to ONNX which can be run on CPU. One issue is to understand the 4 bits weights packing. How are the 4 bits packed for NV CUDA inference?
The text was updated successfully, but these errors were encountered:
Since the AWQ quantization model requires the NV GPU, we can't run it on CPU. I want to convert the model from pytorch to ONNX which can be run on CPU. One issue is to understand the 4 bits weights packing. How are the 4 bits packed for NV CUDA inference?
The text was updated successfully, but these errors were encountered: