[Bug]: inadequate info to debug problem; need more crash output #590

ppbrown · 2024-11-24T18:17:53Z

What happened?

epoch: 0%| | 0/1 [2:08:03<?, ?it/s]
Traceback (most recent call last):
File "/data/OneTrainer32/modules/ui/TrainUI.py", line 561, in __training_thread_function
trainer.train()
File "/data/OneTrainer32/modules/trainer/GenericTrainer.py", line 682, in train
loss.backward()
File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/_tensor.py", line 581, in backward
torch.autograd.backward(
File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/autograd/init.py", line 347, in backward
_engine_run_backward(
File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

What did you expect would happen?

Presumably I have a corrupt cache file somewhere.
But this gives me no idea which one.

Ideally, it would just SKIP the bad one and keep going.
worst case, it should tell me which one was bad?

Relevant log output

epoch:   0%|                                                            | 0/1 [2:08:03<?, ?it/s]
Traceback (most recent call last):
  File "/data/OneTrainer32/modules/ui/TrainUI.py", line 561, in __training_thread_function
    trainer.train()
  File "/data/OneTrainer32/modules/trainer/GenericTrainer.py", line 682, in train
    loss.backward()
  File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/_tensor.py", line 581, in backward
    torch.autograd.backward(
  File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 347, in backward
    _engine_run_backward(
  File "/data/OneTrainer32/venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Output of `pip freeze`

No response

The text was updated successfully, but these errors were encountered:

ppbrown · 2024-11-24T18:19:22Z

Oh!

lol.
actually I had set this thing to "stop training unet after this number of steps".

So actual bug is, "program should tell user it was configured to stop now, and stop gracefully".
not this buggy looking infodump.

ppbrown added the bug Something isn't working label Nov 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: inadequate info to debug problem; need more crash output #590

[Bug]: inadequate info to debug problem; need more crash output #590

ppbrown commented Nov 24, 2024

ppbrown commented Nov 24, 2024

[Bug]: inadequate info to debug problem; need more crash output #590

[Bug]: inadequate info to debug problem; need more crash output #590

Comments

ppbrown commented Nov 24, 2024

What happened?

What did you expect would happen?

Relevant log output

Output of pip freeze

ppbrown commented Nov 24, 2024

Output of `pip freeze`