"error" in training - AttributeError: 'CastOutputToFloat' object has no attribute 'weight', RuntimeError: Only Tensors of floating point and complex dtype can require gradients #29

GreenTeaBD · 2023-03-28T11:52:02Z

WSL2 Ubuntu, new install, I get the following error after it downloads the weights and tries to train.
Sorry I can't give more details, but I'm really not sure what's going on.

Number of samples: 534
Traceback (most recent call last):
File "/home/ckg/.local/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "/home/ckg/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api
result = await self.call_function(
File "/home/ckg/.local/lib/python3.10/site-packages/gradio/blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/ckg/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/ckg/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/ckg/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/ckg/.local/lib/python3.10/site-packages/gradio/helpers.py", line 587, in tracked_fn
response = fn(*args)
File "/home/ckg/github/simple-llama-finetuner/main.py", line 164, in tokenize_and_train
model = peft.prepare_model_for_int8_training(model)
File "/home/ckg/.local/lib/python3.10/site-packages/peft/utils/other.py", line 72, in prepare_model_for_int8_training
File "/home/ckg/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'CastOutputToFloat' object has no attribute 'weight'

lxe · 2023-03-28T16:12:49Z

What hardware are you running on? Any other console traceback?

GreenTeaBD · 2023-03-29T00:30:37Z

I just figured out an important thing while I was typing this comment. I'll include what I was writing before at the end but it appears to finetune correctly if I kill it and run main.py again. If I start finetuning once and then abort, I will get that error on every other attempt. I don't know if it actually will finetune successfully because it's currently running, but yeah, that seems like it's the problem, errors out on a 2nd attempt after an abort.

I did try deleting the leftover directory after the abort, in case that was getting in the way, but that didn't seem to do it. Much less of a problem now since killing the script and restarting isn't a big hassle but, still, probably not running as expected.

Anyway, the original comment:
Windows 11 (but in WSL2), otherwise working WSL install for diffusers and GPT-Neo finetuning (but, in their own environments)
A 5900x, 32GB ram, 100GB swap for WSL2 (I needed a lot for GPT-Neo stuff), and a 4090 for the gpu.

Nothing else to the traceback, everything else in the console is, what I think is, just normal stuff before
(llama-finetuner) ckg@Ryzen9:~/github/simple-llama-finetuner$ python main.py

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA SETUP: CUDA runtime path found: /home/ckg/anaconda3/envs/llama-finetuner/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/ckg/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Loading base model...
Loading base model...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.

In case it helps, the output of nvcc -V is
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

And going through my bash history it looks like I installed everything with
git clone https://github.com/lxe/simple-llama-finetuner
conda create -n llama-finetuner python=3.10
conda activate llama-finetuner
conda install -y cuda -c nvidia/label/cuda-11.7.0
conda install -y pytorch=1.13.1 pytorch-cuda=11.7 -c pytorch
cd simple-llama-finetuner/
pip install -r requirements.txt
python main.py

GreenTeaBD · 2023-03-29T01:18:46Z

Also, quick question that doesn't need its own issue. What's the significance of the 2 empty spaces in between each entry in the training data?

I had finetuned GPT-Neo a bunch, and I'm trying to wrap my head around the differences. And I haven't been able to find out if finetuning LLaMA uses <|endoftext|> tokens or not, or if there's another way to do it. Is that what the two empty lines are adding?

lxe · 2023-03-31T03:30:39Z

Also, quick question that doesn't need its own issue. What's the significance of the 2 empty spaces in between each entry in the training data?

If you want to have newlines or empty lines in each of your samples, it helps. It's just the easiest way to format samples.

lxe · 2023-04-06T23:55:51Z

I just rewrote the whole thing. Still seeing the issue?

lxe added the bug Something isn't working label Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"error" in training - AttributeError: 'CastOutputToFloat' object has no attribute 'weight', RuntimeError: Only Tensors of floating point and complex dtype can require gradients #29

"error" in training - AttributeError: 'CastOutputToFloat' object has no attribute 'weight', RuntimeError: Only Tensors of floating point and complex dtype can require gradients #29

GreenTeaBD commented Mar 28, 2023 •

edited

Loading

lxe commented Mar 28, 2023

GreenTeaBD commented Mar 29, 2023 •

edited

Loading

GreenTeaBD commented Mar 29, 2023

lxe commented Mar 31, 2023

lxe commented Apr 6, 2023

"error" in training - AttributeError: 'CastOutputToFloat' object has no attribute 'weight', RuntimeError: Only Tensors of floating point and complex dtype can require gradients #29

"error" in training - AttributeError: 'CastOutputToFloat' object has no attribute 'weight', RuntimeError: Only Tensors of floating point and complex dtype can require gradients #29

Comments

GreenTeaBD commented Mar 28, 2023 • edited Loading

lxe commented Mar 28, 2023

GreenTeaBD commented Mar 29, 2023 • edited Loading

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

GreenTeaBD commented Mar 29, 2023

lxe commented Mar 31, 2023

lxe commented Apr 6, 2023

GreenTeaBD commented Mar 28, 2023 •

edited

Loading

GreenTeaBD commented Mar 29, 2023 •

edited

Loading

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues