Llama Model Cannot be Exported with onnxruntime #1463

xijianlou1 · 2023-10-17T20:27:10Z

System Info

Hi team :) I'm trying to export [TinyLlama-1.1B-intermediate-step-480k-1T](https://huggingface.co/PY007/TinyLlama-1.1B-intermediate-step-480k-1T) to ONNX (both with optimum.onnxruntime and optimum-cli) but there it failed with dimension mismatch errors. Since Llama is supported by onnx export now. Do you mind give some insight about why this llama model cannot be exported? Here's the script and corresponding error:


import os
from pathlib import Path
import transformers
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.from_pretrained("PY007/TinyLlama-1.1B-intermediate-step-480k-1T", from_transformers=True)
 

```python
The argument `from_transformers` is deprecated, and will be removed in optimum 2.0.  Use `export` instead
Framework not specified. Using pt to export to ONNX.
Using the export variant default. Available variants are:
        - default: The default ONNX variant.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using framework PyTorch: 2.1.0+cu118
Overriding 1 configuration item(s)
        - use_cache -> True
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:808: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:146: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:375: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:382: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py:392: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Saving external data to one file...
Using framework PyTorch: 2.1.0+cu118
Overriding 1 configuration item(s)
        - use_cache -> True
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python311\Lib\site-packages\optimum\onnxruntime\modeling_ort.py", line 647, in from_pretrained
    return super().from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\optimum\modeling_base.py", line 372, in from_pretrained
    return from_pretrained_method(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\optimum\onnxruntime\modeling_decoder.py", line 574, in _from_transformers
    main_export(
  File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\__main__.py", line 505, in main_export
    _, onnx_outputs = export_models(
                      ^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 752, in export_models
    export(
  File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 855, in export
    export_output = export_pytorch(
                    ^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\convert.py", line 572, in export_pytorch
    onnx_export(
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 516, in export
    _export(
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 1596, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 1135, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 1011, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\onnx\utils.py", line 915, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\jit\_trace.py", line 1285, in _get_trace_graph
    outs = ONNXTracedModule(
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\jit\_trace.py", line 133, in forward
    graph, out = torch._C._create_graph_by_tracing(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\jit\_trace.py", line 124, in wrapper
    outs.append(self.inner(*trace_inputs))
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\optimum\exporters\onnx\model_patcher.py", line 112, in patched_forward
    outputs = self.orig_forward(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 1038, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 925, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 635, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xijianlou\AppData\Roaming\Python\Python311\site-packages\transformers\models\llama\modeling_llama.py", line 365, in forward
    key_states = torch.cat([past_key_value[0], key_states], dim=2)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 4 for tensor number 1 in the list.



### Who can help?

For better visibility @JingyaHuang @echarlaix 

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction (minimal, reproducible, runnable)

import os
from pathlib import Path
import transformers
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.from_pretrained("PY007/TinyLlama-1.1B-intermediate-step-480k-1T", from_transformers=True)

### Expected behavior

RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 4 for tensor number 1 in the list.

The text was updated successfully, but these errors were encountered:

claeyzre · 2023-10-17T21:13:25Z

Duplicate of #1399

fxmarty · 2023-10-18T08:18:11Z

Thank you, sorry for the inconvenience. I recommend to install from source while awaiting for a release (that should come probably next week).

xijianlou1 added the bug Something isn't working label Oct 17, 2023

fxmarty closed this as completed Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama Model Cannot be Exported with onnxruntime #1463

Llama Model Cannot be Exported with onnxruntime #1463

xijianlou1 commented Oct 17, 2023

claeyzre commented Oct 17, 2023

fxmarty commented Oct 18, 2023

Llama Model Cannot be Exported with onnxruntime #1463

Llama Model Cannot be Exported with onnxruntime #1463

Comments

xijianlou1 commented Oct 17, 2023

System Info

claeyzre commented Oct 17, 2023

fxmarty commented Oct 18, 2023