[Tracker] autoquant v2 tracker #1215

jerryzh168 · 2024-11-01T23:38:18Z

This issue tracks the further development of autoquant tool

Goal

The goal for autoquant is to be able to get performance speedup over a broad set of models that we care about with minimal accuracy degradations (configurable by user), by reliably selecting the most performant quantization method and kernel implementation for the given input shape for each quantizable layer in the model. It could also be used for selecting hand written kernels that's optimized to get best performance on a specific model, runtime and device.

Performance

Give feedback

Add a new autoquant api to extract a “pre ops -> linear -> post ops", quantize linear and benchmark the subgraph
Test on torchao models
integrate to https://github.com/pytorch/pytorch/tree/main/benchmarks/gpt_fast for dashboard (https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch) and https://github.com/huggingface/optimum-benchmark for more models
include int4, fp8, bf16, fp16 to the search and test again
Add API for custom kernel integration
Test on some internal models, compare with manual benchmarking results and improve tooling
(long term) Resolve shape problem in extract_subgraph and autoquant v2
Options

Accuracy

Give feedback

Add configurable min_sqnr to autoquant v2 and include all variations of quant methods
Test on torchao models
Test on some internal models, compare with manual accuracy eval results and improve tooling
Create report summaries of performance vs accuracy trade offs for certain layers
Utilize tooling for determining sensitive layers
Options

jerryzh168 changed the title ~~[Tracker] autoquant tracker~~ [Tracker] autoquant v2 tracker Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracker] autoquant v2 tracker #1215

[Tracker] autoquant v2 tracker #1215

jerryzh168 commented Nov 1, 2024 •

edited

Loading

Performance

Accuracy

[Tracker] autoquant v2 tracker #1215

[Tracker] autoquant v2 tracker #1215

Comments

jerryzh168 commented Nov 1, 2024 • edited Loading

Goal

Performance

Accuracy

jerryzh168 commented Nov 1, 2024 •

edited

Loading