Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different eval_score and creativity_score when model is loaded in 4 bit #117

Open
itorgov opened this issue Nov 23, 2024 · 0 comments
Open

Comments

@itorgov
Copy link

itorgov commented Nov 23, 2024

In the scoring.get_eval_score.get_eval_score function the model loads with the following code when flash_attn installed:

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,  # This does not hurt performance much according to
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
    f"{request.repo_namespace}/{request.repo_name}",
    revision=request.revision,
    quantization_config=quant_config,
    attn_implementation="flash_attention_2",
    torch_dtype=torch.bfloat16,
    device_map="sequential",
    cache_dir=f"model_cache_dir/{cache_path}",
)

In this case I get the following result {'eval_score': 0.5030202252204222, 'creativity_score': 0.4620578276830795}.

If the flash_attn package is not installed then the model loads by this code:

model = AutoModelForCausalLM.from_pretrained(
    f"{request.repo_namespace}/{request.repo_name}",
    revision=request.revision,
    device_map="auto",
    cache_dir=f"model_cache_dir/{cache_path}",
    # force_download=True
)

And in this case I get completely different result {'eval_score': 0.8671109773311513, 'creativity_score': 0.18112541666424237}.

Two questions:

  1. What is using on production? I assume the second method (without quantization).
  2. Can you fix the code by having only one way of downloading a mode.

P.S.: I can prepare a PR if you wish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant