This repository has been archived by the owner on Oct 11, 2024. It is now read-only.
v0.4.0
Key Features
This release is based on vllm==0.4.3
What's Changed
- turn off single gpu scenario by @andy-neuma in #88
- Benchmarking : Absolute -> Relative imports by @varun-sundar-rabindranath in #85
- Benchmarking : update Gi_per_thread by @varun-sundar-rabindranath in #90
- Update README.md with sparsity and quantization explainers by @mgoin in #91
- Add notebooks for sparsegpt and marlin compression with nm-vllm by @mgoin in #94
- upstream sync 2024-03-04 by @andy-neuma in #89
- Update README.md by @robertgshaw2-neuralmagic in #96
- Formatting : Fix yapf by @varun-sundar-rabindranath in #101
- Lower unstructured sparsity threshold to 40% by @mgoin in #100
- Benchmarking : Misc updates by @varun-sundar-rabindranath in #95
- upstream merge sync 2024-03-11 by @andy-neuma in #108
- Add lm-eval comparison script by @mgoin in #99
- Benchmarks : Standardize benchmark result store by @varun-sundar-rabindranath in #87
- seed whl centric workflows by @andy-neuma in #116
- Benchmarking : Remote push job by @varun-sundar-rabindranath in #92
- reverted accidental commit to main by @robertgshaw2-neuralmagic in #119
- skipped test for nightly failure by @robertgshaw2-neuralmagic in #120
- Turned back on the Marlin tests by @robertgshaw2-neuralmagic in #121
- Benchmarking : Prepare for GHA benchmark UI by @varun-sundar-rabindranath in #122
- Upstream sync 2024 03 14 by @robertgshaw2-neuralmagic in #127
- Benchmark : Update benchmark configs for Nightly by @varun-sundar-rabindranath in #126
- Benchmark : Modify/Add workflows/actions for github-action-benchmark by @varun-sundar-rabindranath in #123
- Benchmark: fix nightly by @varun-sundar-rabindranath in #131
- Fix nightly - 03/18/2024 by @varun-sundar-rabindranath in #136
- Upstream sync 2024 03 18 by @robertgshaw2-neuralmagic in #134
- Update Dockerfile with extensions support by @mgoin in #107
- Benchmark : Turn-off nightly multi-gpu benchmarks temporarily by @varun-sundar-rabindranath in #130
- Benchmark Fix : Remove special tokens from warmup prompts by @varun-sundar-rabindranath in #140
- Delete .github/pull_request_template.md by @mgoin in #145
- Benchmarking : Update readme by @varun-sundar-rabindranath in #144
- Initial Layerwise Profiler by @LucasWilkinson in #124
- Benchmark Fix : Fix JSON decode error by @varun-sundar-rabindranath in #142
- Upstream sync 2024 03 24 by @robertgshaw2-neuralmagic in #143
- Benchmark : Fix remote push job by @varun-sundar-rabindranath in #129
- Benchmarks : Prune nightly benchmarks by @varun-sundar-rabindranath in #150
- Lock lm-evaluation-harness to commit 262f879 by @mgoin in #151
- Benchmarks : Copy benchmark results to EFS by @varun-sundar-rabindranath in #148
- update readme with nvcc threads option by @varun-sundar-rabindranath in #153
- Generate tarball along with wheel build, and upload both in a package to GH by @dhuangnm in #138
- switch to nightly whl's by @andy-neuma in #154
- whl centric workflow for "remote push" by @andy-neuma in #117
- remove low-workload benchmarks that are flaky by @varun-sundar-rabindranath in #156
- nightly patches by @andy-neuma in #160
- Upstream sync v0.4.0.post1 (merged with
upstream-v0.4.0.post1
) by @mgoin in #157 - Bump version to 0.2 by @mgoin in #165
- rename wheels to manylinux and remove unused action by @dhuangnm in #167
- Update collect_env.py package list by @mgoin in #169
- Add lm-eval full accuracy sweep using GSM8k by @mgoin in #166
- Upstream sync 2024 04 08 by @SageMoore in #173
- Updated logo in README by @rgreenberg1 in #178
- Fix sparsity arg in Engine/ModelArgs by @mgoin in #179
- rm model_executor/layers/attention directory since it's been moved by @tlrmchlsmth in #181
- Upstream sync 2024 04 12 by @andy-neuma in #183
- mm publish workflow by @andy-neuma in #193
- GCP related build workflow updates by @andy-neuma in #196
- switch to GCP based build VM by @andy-neuma in #201
- cleanup venv by @andy-neuma in #217
- Upstream sync 2024 04 26 by @robertgshaw2-neuralmagic in #211
- update workflows to use generated whls by @andy-neuma in #204
- Fix nightly benchmark scripts by @dbarbuzzi in #229
- Add lm-eval correctness test by @dbarbuzzi in #210
- switch to k8s runners by @andy-neuma in #231
- Upstream sync 2024 05 05 by @robertgshaw2-neuralmagic in #224
- Marlin 2:4 Downstream (for v0.3 release) by @robertgshaw2-neuralmagic in #239
- Misc CI/CD updates by @dbarbuzzi in #240
- bump version to 0.3.0 by @dhuangnm in #241
- [Bugfix] Fix marlin 2:4 kernel crash on H100 by @mgoin in #243
- switch runner from aws to gcp for generate whl workflow by @dhuangnm in #242
- Add FP8 and marlin 2:4 tests for lm-eval by @mgoin in #244
- updates for nm-magic-wand, nightly or release by @andy-neuma in #247
- version check patch by @andy-neuma in #251
- increase timeouts by @andy-neuma in #253
requirements-dev.txt
and workflow patches by @andy-neuma in #255- updates for automation (and release) by @andy-neuma in #265
- update install commands by @dhuangnm in #264
- Address py38/39 incompatibilities by @dbarbuzzi in #261
- [CI/Build] Basic server correctness test by @derekk-nm in #237
- bump up version and gate magic-wand version by @dhuangnm in #267
- remove release worklfow concurrency limit by @andy-neuma in #270
- [CI/Build] include NOTICE in package dist-info by @derekk-nm in #271
- switch benchmarking and testing jobs to run using "test" label by @andy-neuma in #273
- Handle server startup failure in enter by @dbarbuzzi in #274
- Upstream sync 2024 05 19 by @robertgshaw2-neuralmagic in #249
- Docker image improvements by @dhuangnm in #276
- add latest tag for release docker image by @dhuangnm in #279
New Contributors
- @SageMoore made their first contribution in #173
- @rgreenberg1 made their first contribution in #178
- @derekk-nm made their first contribution in #237
Full Changelog: 0.1.0...v0.4.0