Skip to content

0.45.0: LLM.int8() support for H100; faster 4-bit/8-bit inference #951

0.45.0: LLM.int8() support for H100; faster 4-bit/8-bit inference

0.45.0: LLM.int8() support for H100; faster 4-bit/8-bit inference #951

Annotations

1 warning

build-shared-libs-cuda (ubuntu-latest, x86_64, 12.4.1)

succeeded Dec 5, 2024 in 5m 13s