Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama Model Cannot be Exported with onnxruntime #1463

Closed
xijianlou1 opened this issue Oct 17, 2023 · 2 comments
Closed

Llama Model Cannot be Exported with onnxruntime #1463

xijianlou1 opened this issue Oct 17, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@xijianlou1
Copy link

System Info

Hi team :) I'm trying to export [TinyLlama-1.1B-intermediate-step-480k-1T](https://huggingface.co/PY007/TinyLlama-1.1B-intermediate-step-480k-1T) to ONNX (both with optimum.onnxruntime and optimum-cli) but there it failed with dimension mismatch errors. Since Llama is supported by onnx export now. Do you mind give some insight about why this llama model cannot be exported? Here's the script and corresponding error:


import os
from pathlib import Path
import transformers
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.from_pretrained("PY007/TinyLlama-1.1B-intermediate-step-480k-1T", from_transformers=True)
 

```python
The argument `from_transformers` is deprecated, and will be removed in optimum 2.0.  Use `export` instead
Framework not specified. Using pt to export to ONNX.
Using the export variant default. Available variants are:
        - default: The default ONNX variant.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using framework PyTorch: 2.1.0+cu118
Overriding 1 configuration item(s)
        - use_cache -> True
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:808: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:146: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:375: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:382: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:392: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Saving external data to one file...
Using framework PyTorch: 2.1.0+cu118
Overriding 1 configuration item(s)
        - use_cache -> True
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python311\Lib\site-packages\optimum\onnxruntime\modeling_ort.py", line 647, in from_pretrained
    return super().from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\optimum\modeling_base.py", line 372, in from_pretrained
    return from_pretrained_method(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\optimum\onnxruntime\modeling_decoder.py", line 574, in _from_transformers
    main_export(
  File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\__main__.py", line 505, in main_export
    _, onnx_outputs = export_models(
                      ^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 752, in export_models
    export(
  File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 855, in export
    export_output = export_pytorch(
                    ^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 572, in export_pytorch
    onnx_export(
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 516, in export
    _export(
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 1596, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 1135, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 1011, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 915, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\jit\_trace.py", line 1285, in _get_trace_graph
    outs = ONNXTracedModule(
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\jit\_trace.py", line 133, in forward
    graph, out = torch._C._create_graph_by_tracing(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\jit\_trace.py", line 124, in wrapper
    outs.append(self.inner(*trace_inputs))
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\model_patcher.py", line 112, in patched_forward
    outputs = self.orig_forward(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 1038, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 925, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 635, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 365, in forward
    key_states = torch.cat([past_key_value[0], key_states], dim=2)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 4 for tensor number 1 in the list.


### Who can help?

For better visibility @JingyaHuang @echarlaix 

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction (minimal, reproducible, runnable)

import os
from pathlib import Path
import transformers
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.from_pretrained("PY007/TinyLlama-1.1B-intermediate-step-480k-1T", from_transformers=True)

### Expected behavior

RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 4 for tensor number 1 in the list.
@xijianlou1 xijianlou1 added the bug Something isn't working label Oct 17, 2023
@claeyzre
Copy link
Contributor

Duplicate of #1399

@fxmarty
Copy link
Contributor

fxmarty commented Oct 18, 2023

Thank you, sorry for the inconvenience. I recommend to install from source while awaiting for a release (that should come probably next week).

@fxmarty fxmarty closed this as completed Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants