v0.1.0b4
#Highlights
- Update to TensorRT-LLM version 03-19-2024
- pip installation
- Float8 quantization workflow updated on more robust
- Save and restore prebuild engine from the Hugging Face Hub or locally on the machine
What's Changed
- Add ability to save local prebuilt engines by @mfuntowicz in #87
- Make float8 quantization back in the game. by @mfuntowicz in #92
- Fixed Repetition Penalty default value by @leopra in #66
- Update instructions for pip install by @mfuntowicz in #97
- Update to TensorRT-LLM v031224 by @mfuntowicz in #98
New Contributors
Full Changelog: v0.1.0b3...v0.1.0b4