Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: inadequate info to debug problem; need more crash output #590

Open
ppbrown opened this issue Nov 24, 2024 · 1 comment
Open

[Bug]: inadequate info to debug problem; need more crash output #590

ppbrown opened this issue Nov 24, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@ppbrown
Copy link
Contributor

ppbrown commented Nov 24, 2024

What happened?

epoch: 0%| | 0/1 [2:08:03<?, ?it/s]
Traceback (most recent call last):
File "/data/OneTrainer32/modules/ui/TrainUI.py", line 561, in __training_thread_function
trainer.train()
File "/data/OneTrainer32/modules/trainer/GenericTrainer.py", line 682, in train
loss.backward()
File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/_tensor.py", line 581, in backward
torch.autograd.backward(
File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/autograd/init.py", line 347, in backward
_engine_run_backward(
File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

What did you expect would happen?

Presumably I have a corrupt cache file somewhere.
But this gives me no idea which one.

Ideally, it would just SKIP the bad one and keep going.
worst case, it should tell me which one was bad?

Relevant log output

epoch:   0%|                                                            | 0/1 [2:08:03<?, ?it/s]
Traceback (most recent call last):
  File "/data/OneTrainer32/modules/ui/TrainUI.py", line 561, in __training_thread_function
    trainer.train()
  File "/data/OneTrainer32/modules/trainer/GenericTrainer.py", line 682, in train
    loss.backward()
  File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/_tensor.py", line 581, in backward
    torch.autograd.backward(
  File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 347, in backward
    _engine_run_backward(
  File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Output of pip freeze

No response

@ppbrown ppbrown added the bug Something isn't working label Nov 24, 2024
@ppbrown
Copy link
Contributor Author

ppbrown commented Nov 24, 2024

Oh!

lol.
actually I had set this thing to "stop training unet after this number of steps".

So actual bug is, "program should tell user it was configured to stop now, and stop gracefully".
not this buggy looking infodump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant