Optimum-Nvidia 0.1.0b2 Release, bug fix release
This release is meant to focus on improving the previous one with additional test coverage, bug fixes and more usability improvements
TensorRT-LLM
- Updated TensorRT-LLM to version f7eca56161d496cbd28e8e7689dbd90003594bd2
Improvements
- Generally improve unittest coverage
- Initial documentation and updated build instructions
- The prebuilt container now supports Volta and Tesla (experimental) architectures for V100 and T4 GPUs
- More in-depth usage of TensortRT-LLM Runtime Python C++ binding
Bug Fixes
- Fixed an issue with pipeline returning only the first output when provided with a batch
- Fixed an issue with
bfloat16
conversion not loading weights in the right formats for the TRT Engine builder - Fixed an issue with non Multi Heads Attention setup where the heads were not replicated with the proper factor
Engine Builder changes
- RMSNorm plugin is now being deprecated by Nvidia for performance reasons so we will not attempt to enable it anymore
Model Support
- Mistral familly of model should theorically work but currently it is not being extensively tested through our CI/CD. We plan to add official support in the next release
What's Changed
- bump trt llm version to 0.6.1 by @laikhtewari in #27
- Fix issue returning only the first batch item after pipeline call. by @mfuntowicz in #29
- Update README.md by @eltociear in #31
- Missing comma in setup.py by @IlyasMoutawwakil in #19
- Quality by @mfuntowicz in #30
- Fix typo by @mfuntowicz in #40
- Update to latest trtllm f7eca56161d496cbd28e8e7689dbd90003594bd2 by @mfuntowicz in #41
- Enable more SM architectures in the prebuild docker by @mfuntowicz in #35
- Add initial set of documentation to build the
optimum-nvidia
container by @mfuntowicz in #39 - Fix caching for docker by @mfuntowicz in #15
- Initial set of unittest in CI by @mfuntowicz in #43
- Build from source instructions by @laikhtewari in #38
- Enable testing on GPUs by @mfuntowicz in #45
- Enable HF Transfer in tests by @mfuntowicz in #51
- Let's make sure to use the repeated heads tensor when in a non-mha scenario by @mfuntowicz in #48
- Bump version to 0.1.0b2 by @mfuntowicz in #53
- Add more unittest by @mfuntowicz in #52
- Disable RMSNorm plugin as deprecated for performance reasons by @mfuntowicz in #55
- Rename LLamaForCausalLM to LlamaForCausalLM to match transformers by @mfuntowicz in #54
- AutoModelForCausalLM instead of LlamaForCausalLM by @laikhtewari in #24
- Use the new runtime handled allocation by @mfuntowicz in #46
New Contributors
- @eltociear made their first contribution in #31
- @IlyasMoutawwakil made their first contribution in #19
Full Changelog: v0.1.0b1...v0.1.0b2