You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are using a model of type mini_gemini_mixtral to instantiate a model of type mini_gemini. This is not supported for all configurations of models and can yield errors.
#63
Open
lightmatmul opened this issue
Apr 16, 2024
· 1 comment
I managed to finetune the mini-gemini mixtral model, however post finetuning I am unable to infer with the model. I tried to launch a model worker per described on the repo: python -m minigemini.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40001 --worker http://localhost:40001 --model-path Mini-Gemini-mixtral/
Then I get after a long wait:
You are using a model of type mini_gemini_mixtral to instantiate a model of type mini_gemini. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 0%| | 0/36 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/36 [00:00<?, ?it/s]
2024-04-16 23:00:00 | ERROR | stderr |
2024-04-16 23:00:00 | ERROR | stderr | Traceback (most recent call last):
2024-04-16 23:00:00 | ERROR | stderr | File "<frozen runpy>", line 198, in _run_module_as_main
2024-04-16 23:00:00 | ERROR | stderr | File "<frozen runpy>", line 88, in _run_code
2024-04-16 23:00:00 | ERROR | stderr | File "/home/paperspace/MiniGemini/minigemini/serve/model_worker.py", line 389, in <module>
2024-04-16 23:00:00 | ERROR | stderr | worker = ModelWorker(args.controller_address,
2024-04-16 23:00:00 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr | File "/home/paperspace/MiniGemini/minigemini/serve/model_worker.py", line 76, in __init__
2024-04-16 23:00:00 | ERROR | stderr | self.tokenizer, self.model, self.image_processor, self.context_len = load_pretrained_model(
2024-04-16 23:00:00 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr | File "/home/paperspace/MiniGemini/minigemini/model/builder.py", line 76, in load_pretrained_model
2024-04-16 23:00:00 | ERROR | stderr | model = MiniGeminiLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
2024-04-16 23:00:00 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr | File "/home/paperspace/MiniGemini/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3706, in from_pretrained
2024-04-16 23:00:00 | ERROR | stderr | ) = cls._load_pretrained_model(
2024-04-16 23:00:00 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr | File "/home/paperspace/MiniGemini/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4091, in _load_pretrained_model
2024-04-16 23:00:00 | ERROR | stderr | state_dict = load_state_dict(shard_file)
2024-04-16 23:00:00 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr | File "/home/paperspace/MiniGemini/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 505, in load_state_dict
2024-04-16 23:00:00 | ERROR | stderr | if metadata.get("format") not in ["pt", "tf", "flax"]:
2024-04-16 23:00:00 | ERROR | stderr | ^^^^^^^^^^^^
2024-04-16 23:00:00 | ERROR | stderr | AttributeError: 'NoneType' object has no attribute 'get'
Could this be due to the conversion from zero to fp32 post training ?
I did run zero to fp32 but saved as sharded safetensors instead of a pytorch_model.bin.
The text was updated successfully, but these errors were encountered:
Hi, please rename your finetuned model with the word "8x7b", which is used to load the mixtral model in L68 of model/builder.py. Or, you can just modify the loading regulation in L68 of model/builder.py
I managed to finetune the mini-gemini mixtral model, however post finetuning I am unable to infer with the model. I tried to launch a model worker per described on the repo:
python -m minigemini.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40001 --worker http://localhost:40001 --model-path Mini-Gemini-mixtral/
Then I get after a long wait:
Could this be due to the conversion from zero to fp32 post training ?
I did run zero to fp32 but saved as sharded safetensors instead of a pytorch_model.bin.
The text was updated successfully, but these errors were encountered: