Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add smooth quant #1398

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

mht-sharma
Copy link
Contributor

What does this PR do?

Integrates the SmoothQuant into the Optimum ONNXRuntime Quantizer.

  1. The implementation uses INC behind the scenes for the quantisation.
  2. To apply smooth quant call the quantizer.apply_smooth_quant before the calibration step in the regular quantization process.

Usage

# Load calibration dataset
calibration_dataset = quantizer.get_calibration_dataset(...)

# Apply smooth quantization to the model
quantizer.apply_smooth_quant(
    dataset=calibration_dataset,
    save_dir=save_dir,
    quantization_config=qconfig,
)

# Perform calibration
quantizer.fit(...)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Comment on lines +229 to +240
import importlib

try:
importlib.import_module("neural_compressor.adaptor.ox_utils.smooth_quant")
except Exception as e:
logging.error(f"{e}.")
raise RuntimeError("Neural-compressor is required for SmoothQuant. Please install the library") from e

import copy

import onnx
from neural_compressor.adaptor.ox_utils.smooth_quant import ORTSmoothQuant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not recommended, reference: https://peps.python.org/pep-0008/#imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something done by the ONNXRuntime source code too! Are u suggesting importing it at top level? https://github.com/microsoft/onnxruntime/blob/0f72739b6db129373d221483d61d6637ec11fb28/onnxruntime/python/tools/quantization/quantize.py#L421

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants