Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: The smooth scale contains NaN. #5

Open
ethxnp opened this issue Jun 3, 2024 · 1 comment
Open

AssertionError: The smooth scale contains NaN. #5

ethxnp opened this issue Jun 3, 2024 · 1 comment

Comments

@ethxnp
Copy link

ethxnp commented Jun 3, 2024

Debugging an issue with quantizing a Qwen-72B using QoQ g128. Any thoughts/advice on fixing would be appreciated!

Here's the command: python -m lmquant.llm.run projects/llm/configs/llm.yaml projects/llm/configs/qoq/g128.yaml --model-name <model name> --model-path <model path> --smooth-xw-alpha 0.3 --smooth-xw-beta 0.7

The output:

24-06-03 01:34:27 | D |   - Smooth Quantizing Decoder Layer model.layers.2
24-06-03 01:34:27 | D |     - model.layers.2.self_attn.attn_qk
24-06-03 01:34:27 | D |       + ipts - AbsMax
24-06-03 01:34:27 | D |       + ipts  = [min=2.3145, max=12.2266]
24-06-03 01:34:27 | D |       + opts - AbsMax
24-06-03 01:34:27 | D |       + opts  = [min=1.4668, max=12.6719]
24-06-03 01:34:28 | D |         - x / w range = AbsMax / AbsMax
24-06-03 01:34:28 | D |         - alpha       = [    0.5000]
24-06-03 01:34:28 | D |         - beta        = [    0.0000]
24-06-03 01:34:28 | D |         - sum  error  = [ 2118.1449]
24-06-03 01:34:28 | D |         - best error  = [ 2118.1449]
24-06-03 01:34:28 | D |       + error = 2118.1449
24-06-03 01:34:28 | D |       + scale = [min=1.2111, max=3.5598]
24-06-03 01:34:28 | D |     - model.layers.2.self_attn.proj_out
24-06-03 01:34:29 | D |       + ipts - AbsMax
24-06-03 01:34:29 | D |       + ipts  = [min=0.0388, max=1.3320]
24-06-03 01:34:29 | D |       + wgts - AbsMax
24-06-03 01:34:29 | D |       + wgts  = [min=0.0325, max=0.0770]
24-06-03 01:34:30 | D |         - x / w range = AbsMax / AbsMax
24-06-03 01:34:30 | D |         - alpha       = [    0.3000]
24-06-03 01:34:30 | D |         - beta        = [    0.7000]
24-06-03 01:34:30 | D |         - sum  error  = [   33.6991]
24-06-03 01:34:30 | D |         - best error  = [   33.6991]
24-06-03 01:34:30 | D |       + error = 33.6991
24-06-03 01:34:30 | D |       + scale = [min=3.6428, max=7.8928]
24-06-03 01:34:30 | D |     - model.layers.2.mlp.proj_2nd
24-06-03 01:34:31 | D |       + ipts - AbsMax
24-06-03 01:34:31 | D |       + ipts  = [min=0.0000, max=7.6953]
24-06-03 01:34:31 | D |       + wgts - AbsMax
24-06-03 01:34:31 | D |       + wgts  = [min=0.0000, max=0.0836]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspace/lmquant/lmquant/llm/run.py", line 276, in <module>
    run(config)
  File "/workspace/lmquant/lmquant/llm/run.py", line 158, in run
    smooth_cache = smooth_llm(model, config.quant, tokenizer=tokenizer, calib_config=config.calib)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/llm/quant/smooth.py", line 204, in smooth_llm
    smooth_llm_decoder_layer(
  File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/llm/quant/smooth.py", line 149, in smooth_llm_decoder_layer
    smooth_cache[cache_key] = smooth_linear_modules(
                              ^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/quant/calib/smooth.py", line 112, in smooth_linear_modules
    ).calibrate(
      ^^^^^^^^^^
  File "/workspace/lmquant/lmquant/quant/calib/calibrator/base/search.py", line 569, in calibrate
    return self._calibrate_wgts(ipt_wgts, eval_ipt, eval_mod, ipt_mods, orig_ipt_wgts, eval_kwargs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/quant/calib/calibrator/base/search.py", line 690, in _calibrate_wgts
    self.ask()
  File "/workspace/lmquant/lmquant/quant/calib/calibrator/base/search.py", line 216, in ask
    self.candidate = self._ask()
                     ^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/quant/calib/calibrator/smooth.py", line 337, in _ask
    scale = get_smooth_scale(ipts_range=x_range, wgts_range=w_range, alpha=alpha, beta=beta)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/quant/calib/calibrator/smooth.py", line 48, in get_smooth_scale
    assert not scale.isnan().any(), "The smooth scale contains NaN."
           ^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: The smooth scale contains NaN.
@ethxnp
Copy link
Author

ethxnp commented Jun 3, 2024

Looks like wgts_range contains zeros and is introducing NaNs in the division step in:

@torch.inference_mode()
def get_smooth_scale(*, ipts_range: torch.Tensor, wgts_range: torch.Tensor, alpha: float, beta: float) -> torch.Tensor:
    """Calculate the smooth scale for quantization.

    Args:
        ipts_range (torch.Tensor): Input range.
        wgts_range (torch.Tensor): Weight range.
        alpha (float): Smooth factor for input.
        beta (float): Smooth factor for weight.

    Returns:
        torch.Tensor: Smooth scale.
    """
    assert 0 <= alpha <= 1 and 0 <= beta <= 1, "The smooth factors should be in [0, 1]."
    if alpha > 0:
        scale = ipts_range.pow(alpha)
        if beta > 0:
            scale = scale.div_(wgts_range.pow(beta))
    else:
        scale = wgts_range.pow(-beta)
    scale[scale == 0] = 1
    assert not scale.isnan().any(), "The smooth scale contains NaN."
    assert not scale.isinf().any(), "The smooth scale contains Inf."
    return scale

I'll add

ipts_range = torch.clamp(ipts_range, min=1e-6)
wgts_range = torch.clamp(wgts_range, min=1e-6)

At the function beginning, and see if that is enough to nudge along without introducing error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant