Added DeepSeek V3 support. #688

LagPixelLOL · 2025-01-01T00:27:49Z

I only tested using randomly initialized weights on a 1B version of the model, so this needs further testing for the big 671B model.
Also due to the group size limitation in the gemm CUDA kernel, the group size can only be set to <= 64 or no group size at all.
The testing models are at https://huggingface.co/v2ray/DeepSeek-V3-1B-Test and https://huggingface.co/v2ray/DeepSeek-V3-1B-Test-AWQ.

If anyone can test on the big 671B model thank you so much!!!!!🥺

@casper-hansen @ehartford

ehartford · 2025-01-01T00:46:22Z

I will test it immediately after dinner 😊 thank you!

ehartford · 2025-01-01T01:40:07Z

I still get this error, when I try it on https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16

Traceback (most recent call last):
  File "/home/ubuntu/AutoAWQDeepseek/quant.py", line 9, in <module>
    model = AutoAWQForCausalLM.from_pretrained(model_path)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/AutoAWQDeepseek/awq/models/auto.py", line 78, in from_pretrained
    return AWQ_CAUSAL_LM_MODEL_MAP[model_type].from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/AutoAWQDeepseek/awq/models/base.py", line 386, in from_pretrained
    model = target_cls.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/awq/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/awq/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4264, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/awq/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4755, in _load_pretrained_model
    state_dict = load_state_dict(
                 ^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/awq/lib/python3.11/site-packages/transformers/modeling_utils.py", line 506, in load_state_dict
    if metadata.get("format") not in ["pt", "tf", "flax", "mlx"]:
       ^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

My script:

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = '/home/ubuntu/datasets/models/DeepSeek-V3-bf16-2'
quant_path = '/home/ubuntu/datasets/models/DeepSeek-V3-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# Load model
model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

LagPixelLOL · 2025-01-01T01:45:23Z

Seems like the error is from Transformers, not AutoAWQ.🤔

Can you try updating Transformers to a developer version by installing from source? Because when I check the errored line, it's if metadata is not None and metadata.get("format") not in ["pt", "tf", "flax", "mlx"]: instead of if metadata.get("format") not in ["pt", "tf", "flax", "mlx"]:, so it seems like the error already got fixed in huggingface/transformers@5fcf628.

Also in your script quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" } you set group size to 128, this will make the gemm CUDA kernel error out (at this line) because one of the linear layer in DeepSeek V3's output isn't divisible by 128 (the set group size), you either need to set group size to 64 or go without a group size (set to 0), though if you don't use the CUDA kernel and use the Triton kernel instead, it won't have this issue, but it will also make it incompatible with VLLM (at this line) so I suggest you to not use group size 128.

ehartford · 2025-01-01T03:09:32Z

Might be working - will know in an hour

ehartford · 2025-01-01T03:37:26Z

I had to make this change to modeling.py

https://huggingface.co/deepseek-ai/DeepSeek-V3/discussions/23/files

ehartford · 2025-01-01T03:39:47Z

its just sitting here for an hour at 5% and not even using my GPUs
I'll let it work overnight

ehartford · 2025-01-01T15:48:58Z

It seems to be working. 22 hours remaining

I'll update here when I get it published and tested

LagPixelLOL · 2025-01-01T18:52:19Z

Are you sure all the uninitialized weights in the logs above are OK?

ehartford · 2025-01-01T22:21:02Z

No I'm not sure.

What do you think?

ehartford · 2025-01-01T22:22:02Z

I have an extra node, if you want we could do a call and look at it together

LagPixelLOL · 2025-01-01T23:50:44Z

Can you try load it in just Transformers and do CPU off-load to maybe run a few tokens to see if it's actually a working model? Also check if you get the same uninit weights warning in just Transformers too.

I can't do voice call but if you can add my Discord @v2ray if you want to, so we can text to get updates quicker.

ehartford · 2025-01-02T04:41:44Z

My machine crashed today. Started over.

I'll try what you say

ehartford · 2025-01-10T18:23:49Z

Tested on vLLM, success, coherent. 5 TPS with a single query, 80-100 TPS with 100 simultaneous queries

ehartford · 2025-01-10T18:24:11Z

@casper-hansen can this be merged?

Added DeepSeek V3 support.

db90641

LagPixelLOL added 2 commits January 4, 2025 05:46

Deleted GC call to make it way faster.

8c3a598

Fixed NaNs.

f9c1d20

ehartford mentioned this pull request Jan 10, 2025

[Feature] add support for deepseek v3 gptq / awq sgl-project/sglang#2706

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added DeepSeek V3 support. #688

Added DeepSeek V3 support. #688

LagPixelLOL commented Jan 1, 2025

ehartford commented Jan 1, 2025

ehartford commented Jan 1, 2025 •

edited

Loading

LagPixelLOL commented Jan 1, 2025 •

edited

Loading

ehartford commented Jan 1, 2025

ehartford commented Jan 1, 2025

ehartford commented Jan 1, 2025

ehartford commented Jan 1, 2025 •

edited

Loading

LagPixelLOL commented Jan 1, 2025

ehartford commented Jan 1, 2025

ehartford commented Jan 1, 2025

LagPixelLOL commented Jan 1, 2025 •

edited

Loading

ehartford commented Jan 2, 2025

ehartford commented Jan 10, 2025

ehartford commented Jan 10, 2025

Added DeepSeek V3 support. #688

Are you sure you want to change the base?

Added DeepSeek V3 support. #688

Conversation

LagPixelLOL commented Jan 1, 2025

ehartford commented Jan 1, 2025

ehartford commented Jan 1, 2025 • edited Loading

LagPixelLOL commented Jan 1, 2025 • edited Loading

ehartford commented Jan 1, 2025

ehartford commented Jan 1, 2025

ehartford commented Jan 1, 2025

ehartford commented Jan 1, 2025 • edited Loading

LagPixelLOL commented Jan 1, 2025

ehartford commented Jan 1, 2025

ehartford commented Jan 1, 2025

LagPixelLOL commented Jan 1, 2025 • edited Loading

ehartford commented Jan 2, 2025

ehartford commented Jan 10, 2025

ehartford commented Jan 10, 2025

ehartford commented Jan 1, 2025 •

edited

Loading

LagPixelLOL commented Jan 1, 2025 •

edited

Loading

ehartford commented Jan 1, 2025 •

edited

Loading

LagPixelLOL commented Jan 1, 2025 •

edited

Loading