Skip to content

Optimum-Nvidia 0.1.0b2 Release, bug fix release

Compare
Choose a tag to compare
@mfuntowicz mfuntowicz released this 21 Dec 13:11
· 91 commits to main since this release
938878e

This release is meant to focus on improving the previous one with additional test coverage, bug fixes and more usability improvements

TensorRT-LLM

Improvements

  • Generally improve unittest coverage
  • Initial documentation and updated build instructions
  • The prebuilt container now supports Volta and Tesla (experimental) architectures for V100 and T4 GPUs
  • More in-depth usage of TensortRT-LLM Runtime Python C++ binding

Bug Fixes

  • Fixed an issue with pipeline returning only the first output when provided with a batch
  • Fixed an issue with bfloat16 conversion not loading weights in the right formats for the TRT Engine builder
  • Fixed an issue with non Multi Heads Attention setup where the heads were not replicated with the proper factor

Engine Builder changes

  • RMSNorm plugin is now being deprecated by Nvidia for performance reasons so we will not attempt to enable it anymore

Model Support

  • Mistral familly of model should theorically work but currently it is not being extensively tested through our CI/CD. We plan to add official support in the next release

What's Changed

New Contributors

Full Changelog: v0.1.0b1...v0.1.0b2