How to convert AWQ matmul to onnxruntime MatmulNBits #677

fuyao2024 · 2024-12-09T09:49:22Z

Since the AWQ quantization model requires the NV GPU, we can't run it on CPU. I want to convert the model from pytorch to ONNX which can be run on CPU. One issue is to understand the 4 bits weights packing. How are the 4 bits packed for NV CUDA inference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to convert AWQ matmul to onnxruntime MatmulNBits #677

How to convert AWQ matmul to onnxruntime MatmulNBits #677

fuyao2024 commented Dec 9, 2024

How to convert AWQ matmul to onnxruntime MatmulNBits #677

How to convert AWQ matmul to onnxruntime MatmulNBits #677

Comments

fuyao2024 commented Dec 9, 2024