Better tied weight handling #464

cg123 · 2024-11-30T21:33:42Z

Handle cases where some input models have a tied tensor and some don't.

For example, there are some fine tunes of Llama 3.2 3B floating around that are ~3.6B parameters because they have a separate LM head - with these changes these can be merged with standard sized ones. There will be a LM head in the output model if any inputs have one. Otherwise behavior will be as it was before.

cg123 added 4 commits November 30, 2024 12:51

Handle tied weights and aliases differently

6fb7196

Fix tokensurgeon for tied-weight models

dc780c2

Bump version

3a1ea54

Whoops

74e2b72

cg123 merged commit 68c4b65 into main Nov 30, 2024
6 checks passed

cg123 deleted the tied-weight-handling branch November 30, 2024 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better tied weight handling #464

Better tied weight handling #464

cg123 commented Nov 30, 2024

Better tied weight handling #464

Better tied weight handling #464

Conversation

cg123 commented Nov 30, 2024