Skip to content

Commit

Permalink
Update gptq.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Qubitium authored Dec 24, 2024
1 parent a40264b commit 971651f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/en/quantization/gptq.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ model = AutoModelForCausalLM.from_pretrained("{your_username}/opt-125m-gptq", de

## Marlin

[Marlin](https://github.com/IST-DASLab/marlin)) is a CUDA gptq kernel, 4-bit only, that is highly optimized for the Nvidia A100 GPU (Ampere) architecture where the the loading, dequantization, and execution of post-dequantized weights are highly parallelized offering a substantial inference improvement versus the original CUDA gptq kernel. Marlin is only available for quantized inference and does support model quantization.
[Marlin](https://github.com/IST-DASLab/marlin) is a CUDA gptq kernel, 4-bit only, that is highly optimized for the Nvidia A100 GPU (Ampere) architecture where the the loading, dequantization, and execution of post-dequantized weights are highly parallelized offering a substantial inference improvement versus the original CUDA gptq kernel. Marlin is only available for quantized inference and does support model quantization.


## ExLlama
Expand Down

0 comments on commit 971651f

Please sign in to comment.