We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debugging an issue with quantizing a Qwen-72B using QoQ g128. Any thoughts/advice on fixing would be appreciated!
Here's the command: python -m lmquant.llm.run projects/llm/configs/llm.yaml projects/llm/configs/qoq/g128.yaml --model-name <model name> --model-path <model path> --smooth-xw-alpha 0.3 --smooth-xw-beta 0.7
python -m lmquant.llm.run projects/llm/configs/llm.yaml projects/llm/configs/qoq/g128.yaml --model-name <model name> --model-path <model path> --smooth-xw-alpha 0.3 --smooth-xw-beta 0.7
The output:
24-06-03 01:34:27 | D | - Smooth Quantizing Decoder Layer model.layers.2 24-06-03 01:34:27 | D | - model.layers.2.self_attn.attn_qk 24-06-03 01:34:27 | D | + ipts - AbsMax 24-06-03 01:34:27 | D | + ipts = [min=2.3145, max=12.2266] 24-06-03 01:34:27 | D | + opts - AbsMax 24-06-03 01:34:27 | D | + opts = [min=1.4668, max=12.6719] 24-06-03 01:34:28 | D | - x / w range = AbsMax / AbsMax 24-06-03 01:34:28 | D | - alpha = [ 0.5000] 24-06-03 01:34:28 | D | - beta = [ 0.0000] 24-06-03 01:34:28 | D | - sum error = [ 2118.1449] 24-06-03 01:34:28 | D | - best error = [ 2118.1449] 24-06-03 01:34:28 | D | + error = 2118.1449 24-06-03 01:34:28 | D | + scale = [min=1.2111, max=3.5598] 24-06-03 01:34:28 | D | - model.layers.2.self_attn.proj_out 24-06-03 01:34:29 | D | + ipts - AbsMax 24-06-03 01:34:29 | D | + ipts = [min=0.0388, max=1.3320] 24-06-03 01:34:29 | D | + wgts - AbsMax 24-06-03 01:34:29 | D | + wgts = [min=0.0325, max=0.0770] 24-06-03 01:34:30 | D | - x / w range = AbsMax / AbsMax 24-06-03 01:34:30 | D | - alpha = [ 0.3000] 24-06-03 01:34:30 | D | - beta = [ 0.7000] 24-06-03 01:34:30 | D | - sum error = [ 33.6991] 24-06-03 01:34:30 | D | - best error = [ 33.6991] 24-06-03 01:34:30 | D | + error = 33.6991 24-06-03 01:34:30 | D | + scale = [min=3.6428, max=7.8928] 24-06-03 01:34:30 | D | - model.layers.2.mlp.proj_2nd 24-06-03 01:34:31 | D | + ipts - AbsMax 24-06-03 01:34:31 | D | + ipts = [min=0.0000, max=7.6953] 24-06-03 01:34:31 | D | + wgts - AbsMax 24-06-03 01:34:31 | D | + wgts = [min=0.0000, max=0.0836] Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/workspace/lmquant/lmquant/llm/run.py", line 276, in <module> run(config) File "/workspace/lmquant/lmquant/llm/run.py", line 158, in run smooth_cache = smooth_llm(model, config.quant, tokenizer=tokenizer, calib_config=config.calib) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/lmquant/lmquant/llm/quant/smooth.py", line 204, in smooth_llm smooth_llm_decoder_layer( File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/lmquant/lmquant/llm/quant/smooth.py", line 149, in smooth_llm_decoder_layer smooth_cache[cache_key] = smooth_linear_modules( ^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/lmquant/lmquant/quant/calib/smooth.py", line 112, in smooth_linear_modules ).calibrate( ^^^^^^^^^^ File "/workspace/lmquant/lmquant/quant/calib/calibrator/base/search.py", line 569, in calibrate return self._calibrate_wgts(ipt_wgts, eval_ipt, eval_mod, ipt_mods, orig_ipt_wgts, eval_kwargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/lmquant/lmquant/quant/calib/calibrator/base/search.py", line 690, in _calibrate_wgts self.ask() File "/workspace/lmquant/lmquant/quant/calib/calibrator/base/search.py", line 216, in ask self.candidate = self._ask() ^^^^^^^^^^^ File "/workspace/lmquant/lmquant/quant/calib/calibrator/smooth.py", line 337, in _ask scale = get_smooth_scale(ipts_range=x_range, wgts_range=w_range, alpha=alpha, beta=beta) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/workspace/lmquant/lmquant/quant/calib/calibrator/smooth.py", line 48, in get_smooth_scale assert not scale.isnan().any(), "The smooth scale contains NaN." ^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: The smooth scale contains NaN.
The text was updated successfully, but these errors were encountered:
Looks like wgts_range contains zeros and is introducing NaNs in the division step in:
wgts_range
@torch.inference_mode() def get_smooth_scale(*, ipts_range: torch.Tensor, wgts_range: torch.Tensor, alpha: float, beta: float) -> torch.Tensor: """Calculate the smooth scale for quantization. Args: ipts_range (torch.Tensor): Input range. wgts_range (torch.Tensor): Weight range. alpha (float): Smooth factor for input. beta (float): Smooth factor for weight. Returns: torch.Tensor: Smooth scale. """ assert 0 <= alpha <= 1 and 0 <= beta <= 1, "The smooth factors should be in [0, 1]." if alpha > 0: scale = ipts_range.pow(alpha) if beta > 0: scale = scale.div_(wgts_range.pow(beta)) else: scale = wgts_range.pow(-beta) scale[scale == 0] = 1 assert not scale.isnan().any(), "The smooth scale contains NaN." assert not scale.isinf().any(), "The smooth scale contains Inf." return scale
I'll add
ipts_range = torch.clamp(ipts_range, min=1e-6) wgts_range = torch.clamp(wgts_range, min=1e-6)
At the function beginning, and see if that is enough to nudge along without introducing error.
Sorry, something went wrong.
No branches or pull requests
Debugging an issue with quantizing a Qwen-72B using QoQ g128. Any thoughts/advice on fixing would be appreciated!
Here's the command:
python -m lmquant.llm.run projects/llm/configs/llm.yaml projects/llm/configs/qoq/g128.yaml --model-name <model name> --model-path <model path> --smooth-xw-alpha 0.3 --smooth-xw-beta 0.7
The output:
The text was updated successfully, but these errors were encountered: