-
-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RoPE Frequency Base and Frequency Scale Support #262
Comments
It's exactly the same as alpha. BTW the "base" for codellama base is about alpha 100. |
As per discussion in issue #270. This issue is being reopened. The following is a fairly informal proposal for @turboderp to review: Instead of replacing the current rotary embedding calculation. We have optionality for two. Utilizing Second, let's just pull the calculations done with RoPE from the llama.cpp repo. This will be easier and faster and given the nature of how rotary embeddings function, should not be a problem. Third, while not necessary, an additional testing script for PPL and maybe reviewing sample outputs would be nice. Just to see what are the optimal I'd be happy to formalize this into a spec now. In terms of implementation. I will take a deep dive in a couple weeks assuming no one else is working on it. |
As of now, there is no way to modify RoPE Frequency Base and RoPE Frequency Scale.
We would need to edit
rope.cu
to support parameters for frequency and scale:exllama/exllama_ext/cuda_func/rope.cu
Lines 21 to 31 in 21f4a12
We would also need to add arguments in
model_init.py
to support frequency and scale for RoPE:exllama/model_init.py
Lines 29 to 30 in 21f4a12
Here is a proposed argument to be added to the existing
model_init.py
:Note that this is important to resolve issues like #261 and #260 when context length is larger during inference.
The text was updated successfully, but these errors were encountered: