-
In my case, ngram speculative got diff result with no speculative decode, and even diff with different ngram-prompt-lookup-min args (1 and 2) I just run the same model with different vllm args like: is it should got the same result when I use the same query with "temperature" == 0 ? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
I tried v0.4.2 to v0.6.2, all of them has the same problem |
Beta Was this translation helpful? Give feedback.
-
I changed self.disable_logprobs = True to self.disable_logprobs = False in TargetModelRunner [[SamplerOutput(outputs=[CompletionSequenceGroupOutput(samples=[SequenceOutput(parent_seq_id=0, output_token=50006, logprobs={50006: Logprob(logprob=0.0, rank=1, decoded_token=None)})], prompt_logprobs=None)], sampled_token_probs=torch.Size([1, 115584]), sampled_token_ids=[[26888]], spec_decode_worker_metrics=None)]] maybe there is something wrong in func _sample_with_torch in sampler.py |
Beta Was this translation helpful? Give feedback.
-
but still, I has another case that is not caused by the sop_token_id |
Beta Was this translation helpful? Give feedback.
I'm sorry, this is a issue about our own version of vllm, not this public one