Lookahead in forward #1447

Dionysour · 2024-12-11T18:37:01Z

System Info

transformers '4.47.0', bitsanbytes '0.45.0', torch '2.5.1+cu124', NVIDIA RTX A4000

Reproduction

During assisted greedy generation with the verifier loaded in 8-bit, I have noticed that it diverges from vanilla greedy generation. I have investigated this problem and discovered lookahead during forward on a particular sequence. Here's a snippet of the notebook without unnecessary code. You can run it and see the problem, but you need a Llama2 chat token because it's the model I was using.

https://gist.github.com/Dionysour/24b352bb685f7d4a8ffd18896455700d

Expected behavior

change of [n] token doesn't affect logits of [:n] tokens

matthewdouglas · 2024-12-13T21:36:23Z

Thanks for the report. I was unable to reproduce this with Llama 3.1 8B in an environment with torch 2.5.1+cu124, transformers 4.47.0, bitsandbytes 0.45.0, on an RTX 4090.

I'm downloading Llama2 now and will give that a try.

Dionysour · 2024-12-14T14:24:59Z

I think this behaviour with llama 3.1 is expected. The bug is very model and sequence dependent. I saw different predictions for different models too. So to reproduce it model must be "meta-llama/Llama-2-7b-chat-hf"

matthewdouglas added the bug Something isn't working label Dec 13, 2024

matthewdouglas self-assigned this Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lookahead in forward #1447

Lookahead in forward #1447

Dionysour commented Dec 11, 2024

matthewdouglas commented Dec 13, 2024

Dionysour commented Dec 14, 2024 •

edited

Loading

Lookahead in forward #1447

Lookahead in forward #1447

Comments

Dionysour commented Dec 11, 2024

System Info

Reproduction

Expected behavior

matthewdouglas commented Dec 13, 2024

Dionysour commented Dec 14, 2024 • edited Loading

Dionysour commented Dec 14, 2024 •

edited

Loading