-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Activation based merging - copied over from wip-zipit branch #365
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! One suggestion for a followup, but this should be good to merge.
|
||
# average weights and save them | ||
if merge_matrix: | ||
w = w + w2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A decent next step for this might be to separate this out - if it just output two modified models then we could feed those directly in to mergekit-yaml and be able to try out merge methods other than linear without needing to bring that infrastructure into the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I'll be sure to add this in as a follow-up PR
What is this?
This PR introduces a way to merge two models via their activations and hidden states on a tiny sample of data.
This method uses these activations and hidden states to form correlation matrices to then generate permutation and inverse permutation matrices for weights in each model and then combines them
This PR consists of three main scripts
Assumptions
The models to be merged are of the same architecture and equal block/layer count
Testing
To test this we need to get the
mergekit/scripts/random_permuter.py
script from the branchrope-alignment
(see below the bash stuff for the final inference script i.e
test_by_gen.py
)(test_by_gen.py)
If all goes well, you should see the following (or something along the lines of the following)
Things that couldn't make into the final PR
on-the-fly handling of models with grouped query attention. This hasn't been tested enough for this release but will be in the near future. For now, users will have to resort to using this script first:
Note:
Because this was copied over from another branch (
wip-zipit
) @shamanez 's contributions to the PR is missing, so this is explicit acknowledgement that @shamanez has worked on this PR alongside other authors