You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some layers are not loaded(‘encoder.embed_token.weight’ and 'shared.weight') when loading converted safetensors into transfomrers.MT5EncoderModel.
#520
Open
2 tasks done
WongGawa opened this issue
Aug 23, 2024
· 1 comment
I converted pytorch_model.bin to model.safetensors locally for mt5-xxl model.
And I compared transfromers.MT5EncoderModel's last_hidden_states difference by mt5-xxl's pytorch_model.bin and safetensors.
And because encoder.embed_tokens.weight and shared.weight are not loaded into MT5EncoderModel when loading converted safetensors, which causes those layers to be newly initialized.
Information
The official example scripts
My own modified scripts
Reproduction
convert pytorch_model.bin to model.safetensors by convert.py
loading into transformers.MT5EncoderModel of pytorch_model.bin and converted model.safetensors
compute difference between las_hidden_state of pytorch_model.bin and model.safetensors
Expected behavior
the same output or at lease within a certain margin of error
The text was updated successfully, but these errors were encountered:
There's most likely something being mentioned as warning somewhere during said scripts.
We need to reproduce the issue to be able to fix it. That means sharing (ideally a smaller) pytorch_model.bin file somewhere.
The issue could also lie in your loading code. You should use save_model/load_model if you want to use safetensors.
If you are using torch.save/torch.load then safetensors will raise exceptions correctly.
If you are using save_model/torch.load then issues can arise.
(Same goes with save_pretrained/from_pretrained).
Could you provide a reproducible example so we can help ?
System Info
I converted pytorch_model.bin to model.safetensors locally for mt5-xxl model.
And I compared transfromers.MT5EncoderModel's last_hidden_states difference by mt5-xxl's pytorch_model.bin and safetensors.
And because encoder.embed_tokens.weight and shared.weight are not loaded into MT5EncoderModel when loading converted safetensors, which causes those layers to be newly initialized.
Information
Reproduction
Expected behavior
the same output or at lease within a certain margin of error
The text was updated successfully, but these errors were encountered: