Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with app spin-up #15

Open
AvivSham opened this issue Nov 7, 2024 · 2 comments
Open

Issue with app spin-up #15

AvivSham opened this issue Nov 7, 2024 · 2 comments

Comments

@AvivSham
Copy link

AvivSham commented Nov 7, 2024

Hi all,
How are you?
Thank you for your amazing work!
We followed the README instructions and managed to build both the front and back ends. However, when we tried to spin up the backend by running docker-compose build we encountered the following error which relates to tabbyapi:

tabbyapi                 | Traceback (most recent call last):
tabbyapi                 |   File "/app/main.py", line 171, in <module>
tabbyapi                 |     entrypoint()
tabbyapi                 |   File "/app/main.py", line 167, in entrypoint
tabbyapi                 |     asyncio.run(entrypoint_async())
tabbyapi                 |   File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
tabbyapi                 |     return loop.run_until_complete(main)
tabbyapi                 |   File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
tabbyapi                 |     return future.result()
tabbyapi                 |   File "/app/main.py", line 76, in entrypoint_async
tabbyapi                 |     await model.load_model(model_path.resolve(), **config.model)
tabbyapi                 |   File "/app/common/model.py", line 100, in load_model
tabbyapi                 |     async for _ in load_model_gen(model_path, **kwargs):
tabbyapi                 |   File "/app/common/model.py", line 70, in load_model_gen
tabbyapi                 |     container = ExllamaV2Container(model_path.resolve(), False, **kwargs)
tabbyapi                 |   File "/app/backends/exllamav2/model.py", line 127, in __init__
tabbyapi                 |     self.config.prepare()
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/config.py", line 326, in prepare
tabbyapi                 |     f = STFile.open(st_file, fast = self.fasttensors, keymap = self.arch.keymap)
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 129, in open
tabbyapi                 |     return STFile(filename, fast, keymap)
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 70, in __init__
tabbyapi                 |     self.read_dict()
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 143, in read_dict
tabbyapi                 |     header_json = fp.read(header_size)
tabbyapi                 | MemoryError
tabbyapi exited with code 1

We are running with the following setup:
OS: Ubuntu
GPU: Nvidia A10G (The GPU is recognized by the container)

Thank you for helping!

@nguyenhoangthuan99
Copy link
Collaborator

Hi @AvivSham , It seems like you met error when loading models. Can you try disable fastensors by updating this field to false and try docker-compose up again

https://github.com/homebrewltd/ichigo-demo/blob/f973834f372f08bc3c99a26f31bf6f7db8776480/docker/tabbyapi/config.yml#L97

@AvivSham
Copy link
Author

Hi @nguyenhoangthuan99,
We tried your suggestion and ended up with the same error (see trace below). We also looked at the GPU/CPU load when running docker-compose up but we did not spot something unusual.

[+] Running 3/0
 ⠿ Container tabbyapi                 Created                                                                                    0.0s
 ⠿ Container docker-whisper-speech-1  Created                                                                                    0.0s
 ⠿ Container docker-fish-speech-1     Created                                                                                    0.0s
Attaching to docker-fish-speech-1, docker-whisper-speech-1, tabbyapi
docker-fish-speech-1     | INFO:     Uvicorn running on http://0.0.0.0:22311 (Press CTRL+C to quit)
docker-fish-speech-1     | INFO:     Started parent process [1]
docker-fish-speech-1     | 2024-11-11 09:49:30.241 | INFO     | api:<module>:425 - Loading Llama model...
docker-fish-speech-1     | 2024-11-11 09:49:30.282 | INFO     | api:<module>:425 - Loading Llama model...
tabbyapi                 | INFO:     ExllamaV2 version: 0.2.1
tabbyapi                 | WARNING:  Disabling authentication makes your instance vulnerable. Set the
tabbyapi                 | `disable_auth` flag to False in config.yml if you want to share this instance
tabbyapi                 | with others.
tabbyapi                 | INFO:     Generation logging is enabled for: prompts, generation params
tabbyapi                 | Traceback (most recent call last):
tabbyapi                 |   File "/app/main.py", line 171, in <module>
tabbyapi                 |     entrypoint()
tabbyapi                 |   File "/app/main.py", line 167, in entrypoint
tabbyapi                 |     asyncio.run(entrypoint_async())
tabbyapi                 |   File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
tabbyapi                 |     return loop.run_until_complete(main)
tabbyapi                 |   File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
tabbyapi                 |     return future.result()
tabbyapi                 |   File "/app/main.py", line 76, in entrypoint_async
tabbyapi                 |     await model.load_model(model_path.resolve(), **config.model)
tabbyapi                 |   File "/app/common/model.py", line 100, in load_model
tabbyapi                 |     async for _ in load_model_gen(model_path, **kwargs):
tabbyapi                 |   File "/app/common/model.py", line 70, in load_model_gen
tabbyapi                 |     container = ExllamaV2Container(model_path.resolve(), False, **kwargs)
tabbyapi                 |   File "/app/backends/exllamav2/model.py", line 127, in __init__
tabbyapi                 |     self.config.prepare()
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/config.py", line 326, in prepare
tabbyapi                 |     f = STFile.open(st_file, fast = self.fasttensors, keymap = self.arch.keymap)
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 129, in open
tabbyapi                 |     return STFile(filename, fast, keymap)
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 70, in __init__
tabbyapi                 |     self.read_dict()
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 143, in read_dict
tabbyapi                 |     header_json = fp.read(header_size)
tabbyapi                 | MemoryError
docker-fish-speech-1     | 2024-11-11 09:49:31.387 | INFO     | api:<module>:425 - Loading Llama model...
tabbyapi exited with code 1

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Investigating
Development

No branches or pull requests

2 participants