You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During assisted greedy generation with the verifier loaded in 8-bit, I have noticed that it diverges from vanilla greedy generation. I have investigated this problem and discovered lookahead during forward on a particular sequence. Here's a snippet of the notebook without unnecessary code. You can run it and see the problem, but you need a Llama2 chat token because it's the model I was using.
Thanks for the report. I was unable to reproduce this with Llama 3.1 8B in an environment with torch 2.5.1+cu124, transformers 4.47.0, bitsandbytes 0.45.0, on an RTX 4090.
I'm downloading Llama2 now and will give that a try.
I think this behaviour with llama 3.1 is expected. The bug is very model and sequence dependent. I saw different predictions for different models too. So to reproduce it model must be "meta-llama/Llama-2-7b-chat-hf"
System Info
transformers '4.47.0', bitsanbytes '0.45.0', torch '2.5.1+cu124', NVIDIA RTX A4000
Reproduction
During assisted greedy generation with the verifier loaded in 8-bit, I have noticed that it diverges from vanilla greedy generation. I have investigated this problem and discovered lookahead during forward on a particular sequence. Here's a snippet of the notebook without unnecessary code. You can run it and see the problem, but you need a Llama2 chat token because it's the model I was using.
https://gist.github.com/Dionysour/24b352bb685f7d4a8ffd18896455700d
Expected behavior
change of [n] token doesn't affect logits of [:n] tokens
The text was updated successfully, but these errors were encountered: