Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with xlm_roberta #9

Open
puppetm4st3r opened this issue Dec 17, 2023 · 2 comments
Open

problem with xlm_roberta #9

puppetm4st3r opened this issue Dec 17, 2023 · 2 comments

Comments

@puppetm4st3r
Copy link

Hi, i'm trying to convert this model:

from lsg_converter import LSGConverter
converter = LSGConverter(max_sequence_length=4096)
model, tokenizer = converter.convert_from_pretrained(model_name_or_path="T-Systems-onsite/cross-en-es-roberta-sentence-transformer")
print(type(model))

and seems to be converted ok:

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Some weights of LSGXLMRobertaModel were not initialized from the model checkpoint at T-Systems-onsite/cross-en-es-roberta-sentence-transformer and are newly initialized: ['embeddings.global_embeddings.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
<class 'lsg_converter.xlm_roberta.modeling_lsg_xlm_roberta.LSGXLMRobertaModel'>

but when i use the model with a long text (is and embedding model), i get:

{
	"name": "RuntimeError",
	"message": "The expanded size of the tensor (1193) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [2, 1193].  Tensor sizes: [1, 514]",
	"stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/home/dario/src/lsg_embeddings.ipynb Cell 3 line 3
     <a href='vscode-notebook-cell:/home/dario/src/lsg_embeddings.ipynb#W2sZmlsZQ%3D%3D?line=28'>29</a> # Compute token embeddings
     <a href='vscode-notebook-cell:/home/dario/src/lsg_embeddings.ipynb#W2sZmlsZQ%3D%3D?line=29'>30</a> with torch.no_grad():
---> <a href='vscode-notebook-cell:/home/dario/src/lsg_embeddings.ipynb#W2sZmlsZQ%3D%3D?line=30'>31</a>     model_output = model(**encoded_input)
     <a href='vscode-notebook-cell:/home/dario/src/lsg_embeddings.ipynb#W2sZmlsZQ%3D%3D?line=32'>33</a> # Perform pooling. In this case, max pooling.
     <a href='vscode-notebook-cell:/home/dario/src/lsg_embeddings.ipynb#W2sZmlsZQ%3D%3D?line=33'>34</a> sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~/.local/lib/python3.10/site-packages/transformers/models/roberta/modeling_roberta.py:801, in RobertaModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    799 if hasattr(self.embeddings, \"token_type_ids\"):
    800     buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
--> 801     buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
    802     token_type_ids = buffered_token_type_ids_expanded
    803 else:

RuntimeError: The expanded size of the tensor (1193) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [2, 1193].  Tensor sizes: [1, 514]"

With bert models works like a sharm

@ccdv-ai
Copy link
Owner

ccdv-ai commented Dec 17, 2023

Hi @puppetm4st3r
Should be fixed with the last release
pip install lsg-converter --upgrade

@puppetm4st3r
Copy link
Author

Thanks! will try it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants