Different from the output of the HF inference #280
-
Looking forward to your reply! I set the temperature=0.1, top-k=10, top-p=0.75,I think i infer the same prompt, I will get the same output, through test the hf and vllm inference, HF will get stable output then vllm sometimes get the different output. Is this normal? The parameters of the two inferences are the same and do not bring the same constraints. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 3 replies
-
I have encountered the same problem. use llama 13B model, set max_tokens=256, frequency_penalty=0.1, temperature=0.1, top-k=50, top-p=0.75,I tested on a set of 40 questions and found that the outputs for 15 questions were different from the outputs obtained using huggingface inference. |
Beta Was this translation helpful? Give feedback.
-
The LLM inference process includes sampling, which is a random process. Because the implementation of HF and vLLM are different, it is normal to get different samples. However, if you perform argmax sampling (e.g., |
Beta Was this translation helpful? Give feedback.
-
I am using greedy search in decoding. |
Beta Was this translation helpful? Give feedback.
-
I've meet the same problem. |
Beta Was this translation helpful? Give feedback.
-
Has this issue been resolved? |
Beta Was this translation helpful? Give feedback.
-
I have encountered the same issue for |
Beta Was this translation helpful? Give feedback.
The LLM inference process includes sampling, which is a random process. Because the implementation of HF and vLLM are different, it is normal to get different samples. However, if you perform argmax sampling (e.g.,
temperature=0
), then you should be able to see the same results.