Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added DeepSeek V3 support. #688

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

LagPixelLOL
Copy link

#686

I only tested using randomly initialized weights on a 1B version of the model, so this needs further testing for the big 671B model.
Also due to the group size limitation in the gemm CUDA kernel, the group size can only be set to <= 64 or no group size at all.
The testing models are at https://huggingface.co/v2ray/DeepSeek-V3-1B-Test and https://huggingface.co/v2ray/DeepSeek-V3-1B-Test-AWQ.

If anyone can test on the big 671B model thank you so much!!!!!🥺

@casper-hansen @ehartford

@ehartford
Copy link

I will test it immediately after dinner 😊 thank you!

@ehartford
Copy link

ehartford commented Jan 1, 2025

I still get this error, when I try it on https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16

Traceback (most recent call last):
  File "/home/ubuntu/AutoAWQDeepseek/quant.py", line 9, in <module>
    model = AutoAWQForCausalLM.from_pretrained(model_path)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/AutoAWQDeepseek/awq/models/auto.py", line 78, in from_pretrained
    return AWQ_CAUSAL_LM_MODEL_MAP[model_type].from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/AutoAWQDeepseek/awq/models/base.py", line 386, in from_pretrained
    model = target_cls.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/awq/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/awq/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4264, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/awq/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4755, in _load_pretrained_model
    state_dict = load_state_dict(
                 ^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/awq/lib/python3.11/site-packages/transformers/modeling_utils.py", line 506, in load_state_dict
    if metadata.get("format") not in ["pt", "tf", "flax", "mlx"]:
       ^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

My script:

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = '/home/ubuntu/datasets/models/DeepSeek-V3-bf16-2'
quant_path = '/home/ubuntu/datasets/models/DeepSeek-V3-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# Load model
model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

@LagPixelLOL
Copy link
Author

LagPixelLOL commented Jan 1, 2025

Seems like the error is from Transformers, not AutoAWQ.🤔

Can you try updating Transformers to a developer version by installing from source? Because when I check the errored line, it's if metadata is not None and metadata.get("format") not in ["pt", "tf", "flax", "mlx"]: instead of if metadata.get("format") not in ["pt", "tf", "flax", "mlx"]:, so it seems like the error already got fixed in huggingface/transformers@5fcf628.

Also in your script quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" } you set group size to 128, this will make the gemm CUDA kernel error out (at this line) because one of the linear layer in DeepSeek V3's output isn't divisible by 128 (the set group size), you either need to set group size to 64 or go without a group size (set to 0), though if you don't use the CUDA kernel and use the Triton kernel instead, it won't have this issue, but it will also make it incompatible with VLLM (at this line) so I suggest you to not use group size 128.

@ehartford
Copy link

image

Might be working - will know in an hour

@ehartford
Copy link

I had to make this change to modeling.py

https://huggingface.co/deepseek-ai/DeepSeek-V3/discussions/23/files

@ehartford
Copy link

image
its just sitting here for an hour at 5% and not even using my GPUs
I'll let it work overnight

@ehartford
Copy link

ehartford commented Jan 1, 2025

It seems to be working. 22 hours remaining

I'll update here when I get it published and tested

@LagPixelLOL
Copy link
Author

Are you sure all the uninitialized weights in the logs above are OK?

@ehartford
Copy link

No I'm not sure.

What do you think?

@ehartford
Copy link

I have an extra node, if you want we could do a call and look at it together

@LagPixelLOL
Copy link
Author

LagPixelLOL commented Jan 1, 2025

Can you try load it in just Transformers and do CPU off-load to maybe run a few tokens to see if it's actually a working model? Also check if you get the same uninit weights warning in just Transformers too.

I can't do voice call but if you can add my Discord @v2ray if you want to, so we can text to get updates quicker.

@ehartford
Copy link

My machine crashed today. Started over.

I'll try what you say

@ehartford
Copy link

Tested on vLLM, success, coherent. 5 TPS with a single query, 80-100 TPS with 100 simultaneous queries

@ehartford
Copy link

@casper-hansen can this be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants