AssertionError: The smooth scale contains NaN. #5

ethxnp · 2024-06-03T01:38:17Z

Debugging an issue with quantizing a Qwen-72B using QoQ g128. Any thoughts/advice on fixing would be appreciated!

Here's the command: python -m lmquant.llm.run projects/llm/configs/llm.yaml projects/llm/configs/qoq/g128.yaml --model-name <model name> --model-path <model path> --smooth-xw-alpha 0.3 --smooth-xw-beta 0.7

The output:

24-06-03 01:34:27 | D |   - Smooth Quantizing Decoder Layer model.layers.2
24-06-03 01:34:27 | D |     - model.layers.2.self_attn.attn_qk
24-06-03 01:34:27 | D |       + ipts - AbsMax
24-06-03 01:34:27 | D |       + ipts  = [min=2.3145, max=12.2266]
24-06-03 01:34:27 | D |       + opts - AbsMax
24-06-03 01:34:27 | D |       + opts  = [min=1.4668, max=12.6719]
24-06-03 01:34:28 | D |         - x / w range = AbsMax / AbsMax
24-06-03 01:34:28 | D |         - alpha       = [    0.5000]
24-06-03 01:34:28 | D |         - beta        = [    0.0000]
24-06-03 01:34:28 | D |         - sum  error  = [ 2118.1449]
24-06-03 01:34:28 | D |         - best error  = [ 2118.1449]
24-06-03 01:34:28 | D |       + error = 2118.1449
24-06-03 01:34:28 | D |       + scale = [min=1.2111, max=3.5598]
24-06-03 01:34:28 | D |     - model.layers.2.self_attn.proj_out
24-06-03 01:34:29 | D |       + ipts - AbsMax
24-06-03 01:34:29 | D |       + ipts  = [min=0.0388, max=1.3320]
24-06-03 01:34:29 | D |       + wgts - AbsMax
24-06-03 01:34:29 | D |       + wgts  = [min=0.0325, max=0.0770]
24-06-03 01:34:30 | D |         - x / w range = AbsMax / AbsMax
24-06-03 01:34:30 | D |         - alpha       = [    0.3000]
24-06-03 01:34:30 | D |         - beta        = [    0.7000]
24-06-03 01:34:30 | D |         - sum  error  = [   33.6991]
24-06-03 01:34:30 | D |         - best error  = [   33.6991]
24-06-03 01:34:30 | D |       + error = 33.6991
24-06-03 01:34:30 | D |       + scale = [min=3.6428, max=7.8928]
24-06-03 01:34:30 | D |     - model.layers.2.mlp.proj_2nd
24-06-03 01:34:31 | D |       + ipts - AbsMax
24-06-03 01:34:31 | D |       + ipts  = [min=0.0000, max=7.6953]
24-06-03 01:34:31 | D |       + wgts - AbsMax
24-06-03 01:34:31 | D |       + wgts  = [min=0.0000, max=0.0836]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspace/lmquant/lmquant/llm/run.py", line 276, in <module>
    run(config)
  File "/workspace/lmquant/lmquant/llm/run.py", line 158, in run
    smooth_cache = smooth_llm(model, config.quant, tokenizer=tokenizer, calib_config=config.calib)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/llm/quant/smooth.py", line 204, in smooth_llm
    smooth_llm_decoder_layer(
  File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/llm/quant/smooth.py", line 149, in smooth_llm_decoder_layer
    smooth_cache[cache_key] = smooth_linear_modules(
                              ^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/quant/calib/smooth.py", line 112, in smooth_linear_modules
    ).calibrate(
      ^^^^^^^^^^
  File "/workspace/lmquant/lmquant/quant/calib/calibrator/base/search.py", line 569, in calibrate
    return self._calibrate_wgts(ipt_wgts, eval_ipt, eval_mod, ipt_mods, orig_ipt_wgts, eval_kwargs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/quant/calib/calibrator/base/search.py", line 690, in _calibrate_wgts
    self.ask()
  File "/workspace/lmquant/lmquant/quant/calib/calibrator/base/search.py", line 216, in ask
    self.candidate = self._ask()
                     ^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/quant/calib/calibrator/smooth.py", line 337, in _ask
    scale = get_smooth_scale(ipts_range=x_range, wgts_range=w_range, alpha=alpha, beta=beta)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/lmquant/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lmquant/lmquant/quant/calib/calibrator/smooth.py", line 48, in get_smooth_scale
    assert not scale.isnan().any(), "The smooth scale contains NaN."
           ^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: The smooth scale contains NaN.

The text was updated successfully, but these errors were encountered:

ethxnp · 2024-06-03T02:10:51Z

Looks like wgts_range contains zeros and is introducing NaNs in the division step in:

@torch.inference_mode()
def get_smooth_scale(*, ipts_range: torch.Tensor, wgts_range: torch.Tensor, alpha: float, beta: float) -> torch.Tensor:
    """Calculate the smooth scale for quantization.

    Args:
        ipts_range (torch.Tensor): Input range.
        wgts_range (torch.Tensor): Weight range.
        alpha (float): Smooth factor for input.
        beta (float): Smooth factor for weight.

    Returns:
        torch.Tensor: Smooth scale.
    """
    assert 0 <= alpha <= 1 and 0 <= beta <= 1, "The smooth factors should be in [0, 1]."
    if alpha > 0:
        scale = ipts_range.pow(alpha)
        if beta > 0:
            scale = scale.div_(wgts_range.pow(beta))
    else:
        scale = wgts_range.pow(-beta)
    scale[scale == 0] = 1
    assert not scale.isnan().any(), "The smooth scale contains NaN."
    assert not scale.isinf().any(), "The smooth scale contains Inf."
    return scale

I'll add

ipts_range = torch.clamp(ipts_range, min=1e-6)
wgts_range = torch.clamp(wgts_range, min=1e-6)

At the function beginning, and see if that is enough to nudge along without introducing error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError: The smooth scale contains NaN. #5

AssertionError: The smooth scale contains NaN. #5

ethxnp commented Jun 3, 2024

ethxnp commented Jun 3, 2024 •

edited

Loading

AssertionError: The smooth scale contains NaN. #5

AssertionError: The smooth scale contains NaN. #5

Comments

ethxnp commented Jun 3, 2024

ethxnp commented Jun 3, 2024 • edited Loading

ethxnp commented Jun 3, 2024 •

edited

Loading