Release v0.4.0 · neuralmagic/nm-vllm

Key Features

This release is based on vllm==0.4.3

What's Changed

turn off single gpu scenario by @andy-neuma in #88
Benchmarking : Absolute -> Relative imports by @varun-sundar-rabindranath in #85
Benchmarking : update Gi_per_thread by @varun-sundar-rabindranath in #90
Update README.md with sparsity and quantization explainers by @mgoin in #91
Add notebooks for sparsegpt and marlin compression with nm-vllm by @mgoin in #94
upstream sync 2024-03-04 by @andy-neuma in #89
Update README.md by @robertgshaw2-neuralmagic in #96
Formatting : Fix yapf by @varun-sundar-rabindranath in #101
Lower unstructured sparsity threshold to 40% by @mgoin in #100
Benchmarking : Misc updates by @varun-sundar-rabindranath in #95
upstream merge sync 2024-03-11 by @andy-neuma in #108
Add lm-eval comparison script by @mgoin in #99
Benchmarks : Standardize benchmark result store by @varun-sundar-rabindranath in #87
seed whl centric workflows by @andy-neuma in #116
Benchmarking : Remote push job by @varun-sundar-rabindranath in #92
reverted accidental commit to main by @robertgshaw2-neuralmagic in #119
skipped test for nightly failure by @robertgshaw2-neuralmagic in #120
Turned back on the Marlin tests by @robertgshaw2-neuralmagic in #121
Benchmarking : Prepare for GHA benchmark UI by @varun-sundar-rabindranath in #122
Upstream sync 2024 03 14 by @robertgshaw2-neuralmagic in #127
Benchmark : Update benchmark configs for Nightly by @varun-sundar-rabindranath in #126
Benchmark : Modify/Add workflows/actions for github-action-benchmark by @varun-sundar-rabindranath in #123
Benchmark: fix nightly by @varun-sundar-rabindranath in #131
Fix nightly - 03/18/2024 by @varun-sundar-rabindranath in #136
Upstream sync 2024 03 18 by @robertgshaw2-neuralmagic in #134
Update Dockerfile with extensions support by @mgoin in #107
Benchmark : Turn-off nightly multi-gpu benchmarks temporarily by @varun-sundar-rabindranath in #130
Benchmark Fix : Remove special tokens from warmup prompts by @varun-sundar-rabindranath in #140
Delete .github/pull_request_template.md by @mgoin in #145
Benchmarking : Update readme by @varun-sundar-rabindranath in #144
Initial Layerwise Profiler by @LucasWilkinson in #124
Benchmark Fix : Fix JSON decode error by @varun-sundar-rabindranath in #142
Upstream sync 2024 03 24 by @robertgshaw2-neuralmagic in #143
Benchmark : Fix remote push job by @varun-sundar-rabindranath in #129
Benchmarks : Prune nightly benchmarks by @varun-sundar-rabindranath in #150
Lock lm-evaluation-harness to commit 262f879 by @mgoin in #151
Benchmarks : Copy benchmark results to EFS by @varun-sundar-rabindranath in #148
update readme with nvcc threads option by @varun-sundar-rabindranath in #153
Generate tarball along with wheel build, and upload both in a package to GH by @dhuangnm in #138
switch to nightly whl's by @andy-neuma in #154
whl centric workflow for "remote push" by @andy-neuma in #117
remove low-workload benchmarks that are flaky by @varun-sundar-rabindranath in #156
nightly patches by @andy-neuma in #160
Upstream sync v0.4.0.post1 (merged with upstream-v0.4.0.post1) by @mgoin in #157
Bump version to 0.2 by @mgoin in #165
rename wheels to manylinux and remove unused action by @dhuangnm in #167
Update collect_env.py package list by @mgoin in #169
Add lm-eval full accuracy sweep using GSM8k by @mgoin in #166
Upstream sync 2024 04 08 by @SageMoore in #173
Updated logo in README by @rgreenberg1 in #178
Fix sparsity arg in Engine/ModelArgs by @mgoin in #179
rm model_executor/layers/attention directory since it's been moved by @tlrmchlsmth in #181
Upstream sync 2024 04 12 by @andy-neuma in #183
mm publish workflow by @andy-neuma in #193
GCP related build workflow updates by @andy-neuma in #196
switch to GCP based build VM by @andy-neuma in #201
cleanup venv by @andy-neuma in #217
Upstream sync 2024 04 26 by @robertgshaw2-neuralmagic in #211
update workflows to use generated whls by @andy-neuma in #204
Fix nightly benchmark scripts by @dbarbuzzi in #229
Add lm-eval correctness test by @dbarbuzzi in #210
switch to k8s runners by @andy-neuma in #231
Upstream sync 2024 05 05 by @robertgshaw2-neuralmagic in #224
Marlin 2:4 Downstream (for v0.3 release) by @robertgshaw2-neuralmagic in #239
Misc CI/CD updates by @dbarbuzzi in #240
bump version to 0.3.0 by @dhuangnm in #241
[Bugfix] Fix marlin 2:4 kernel crash on H100 by @mgoin in #243
switch runner from aws to gcp for generate whl workflow by @dhuangnm in #242
Add FP8 and marlin 2:4 tests for lm-eval by @mgoin in #244
updates for nm-magic-wand, nightly or release by @andy-neuma in #247
version check patch by @andy-neuma in #251
increase timeouts by @andy-neuma in #253
requirements-dev.txt and workflow patches by @andy-neuma in #255
updates for automation (and release) by @andy-neuma in #265
update install commands by @dhuangnm in #264
Address py38/39 incompatibilities by @dbarbuzzi in #261
[CI/Build] Basic server correctness test by @derekk-nm in #237
bump up version and gate magic-wand version by @dhuangnm in #267
remove release worklfow concurrency limit by @andy-neuma in #270
[CI/Build] include NOTICE in package dist-info by @derekk-nm in #271
switch benchmarking and testing jobs to run using "test" label by @andy-neuma in #273
Handle server startup failure in enter by @dbarbuzzi in #274
Upstream sync 2024 05 19 by @robertgshaw2-neuralmagic in #249
Docker image improvements by @dhuangnm in #276
add latest tag for release docker image by @dhuangnm in #279

New Contributors

@SageMoore made their first contribution in #173
@rgreenberg1 made their first contribution in #178
@derekk-nm made their first contribution in #237

Full Changelog: 0.1.0...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

Key Features

What's Changed

New Contributors

Contributors