-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RWKV5
] Add support for RWKV5 model
#29095
base: main
Are you sure you want to change the base?
Conversation
I think it would be a good idea to set it to infinite, because rwkv don't have sequence length limit in theory. |
Okay my slow tests are all green for the |
The fast CUDA path works thanks to @kashif , but the cpu does not yet |
One month ago, there was no problem. |
Important: maybe less problematic for v5 (or maybe not!), but I found that for v6 the following line is absolutely terrible for inference accuracy:
verus the original @BBuf version (which I tweaked and adapted to v6):
The issue is that the potential down-cast to bf16 prior to the groupnorm causes really bad inference quality. If you look closely this is written differently than in the original @BBuf version where the down-cast occurs after the groupnorm during non-cuda inference. Please see these lines of Bo Peng's original ChatRWKV code for reference about this groupnorm needing float32: |
Will update the group norm! |
Bug fixed by BBuf/RWKV-World-HF-Tokenizer@6dd44c8 , it has no relation with this pr, I will update hf repo RWKV/rwkv-6-world-1b6 later . |
It has been solved in |
What does this PR do?
Adds RWKV5, superseeds #26963