feat: Try experimental commit making tensorizer less intrusive #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I feel my current feature branch was proposing fairly "ambitious" changes for exllamav2, or at least ones that would ask the maintainer(s) to add fairly significant properties to their inference engine (like a
state_dict
as a public attribute for their config class).I tried in this commit to see if I could integrate
tensorizer
in a way that minimizes modifications plainly in the source code of exllamav2's model loading machinery by applying thetensorizer_context
decorator to functions wheretensorizer
hooks are needed.This is unfinished and shoddy. I honestly do not need to go this overboard to make my PR "less intrusive" with this proposal of changes, but felt it might be interesting to consider. Some motivations for this weren't necessarily made "better" either (for instance, I still create a private
_state_dict
attribute for their config class, but only if the decorator is called).I figured I'd get initial feedback before going off in this direction, and scrapping it otherwise, while this change is still unfinished.
This change should (hopefully) make for a smaller line count in their core logic. The diff here is for my feature branch, but this can be made clear when comparing to this fork's
master
branch. It also likely might make the integration less readable as it plays with things like attributes for function objects. It also tries to modularize certain parts of exl2's loading logic in to functions to apply hooks in to, which goes against the unintrusiveness idealism.