You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I am trying to use beam search while doing inference on my GPTQ quantized 4-bit Llama model whose base model is daekeun-ml/Llama-2-ko-instruct-13B. I got an error like this:
Model loaded: ['...']
Starting server on address 0.0.0.0:8004
{'beams': 3, 'beam_length': 3, 'in_beam_search': True}
ERROR:example_flask:Exception on /infer_bench [POST]
Traceback (most recent call last):
File "/home/bibekyess/anaconda3/envs/exllama-env/lib/python3.11/site-packages/flask/app.py", line 1455, in wsgi_app
response = self.full_dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bibekyess/anaconda3/envs/exllama-env/lib/python3.11/site-packages/flask/app.py", line 869, in full_dispatch_request
rv = self.handle_user_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bibekyess/anaconda3/envs/exllama-env/lib/python3.11/site-packages/flask/app.py", line 867, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bibekyess/anaconda3/envs/exllama-env/lib/python3.11/site-packages/flask/app.py", line 852, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bibekyess/exllama_sandbox/exllama/example_flask.py", line 97, in inferContextB
outputs = generator.generate_simple(prompt, max_new_tokens = 400)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bibekyess/exllama_sandbox/exllama/generator.py", line 313, in generate_simple
self.end_beam_search()
File "/home/bibekyess/exllama_sandbox/exllama/generator.py", line 698, in end_beam_search
self.sequence = self.sequence_actual.clone()
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'clone'
Has anyone faced the similar error before?
To reproduce I think you can set this in the generator settings:
Hello! I am trying to use beam search while doing inference on my GPTQ quantized 4-bit Llama model whose base model is
daekeun-ml/Llama-2-ko-instruct-13B
. I got an error like this:Has anyone faced the similar error before?
To reproduce I think you can set this in the generator settings:
Thank you for your help! :)
The text was updated successfully, but these errors were encountered: