You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the scoring.get_eval_score.get_eval_score function the model loads with the following code when flash_attn installed:
quant_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True, # This does not hurt performance much according tobnb_4bit_compute_dtype=torch.bfloat16,
)
model=AutoModelForCausalLM.from_pretrained(
f"{request.repo_namespace}/{request.repo_name}",
revision=request.revision,
quantization_config=quant_config,
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16,
device_map="sequential",
cache_dir=f"model_cache_dir/{cache_path}",
)
In this case I get the following result {'eval_score': 0.5030202252204222, 'creativity_score': 0.4620578276830795}.
If the flash_attn package is not installed then the model loads by this code:
In the
scoring.get_eval_score.get_eval_score
function the model loads with the following code whenflash_attn
installed:In this case I get the following result
{'eval_score': 0.5030202252204222, 'creativity_score': 0.4620578276830795}
.If the
flash_attn
package is not installed then the model loads by this code:And in this case I get completely different result
{'eval_score': 0.8671109773311513, 'creativity_score': 0.18112541666424237}
.Two questions:
P.S.: I can prepare a PR if you wish.
The text was updated successfully, but these errors were encountered: