Skip to content

0.45.0: LLM.int8() support for H100; faster 4-bit/8-bit inference #951

0.45.0: LLM.int8() support for H100; faster 4-bit/8-bit inference

0.45.0: LLM.int8() support for H100; faster 4-bit/8-bit inference #951

build-shared-libs-cuda (windows-latest, x86_64, 12.4.1)

succeeded Dec 5, 2024 in 8m 17s