Fix a incorrect assert on frame count for PT_XLA_DEBUG=1 #6466

JackCaoG · 2024-02-05T03:47:04Z

I discovered this while doing a repo for the customer

  File "test.py", line 43, in main
    trainer.train()
  File "/src/repo/magvit2-pytorch/magvit2_pytorch/trainer.py", line 517, in train
    self.train_step(dl_iter)
  File "/src/repo/magvit2-pytorch/magvit2_pytorch/trainer.py", line 351, in train_step
    loss, loss_breakdown = self.model(
  File "/src/pytorch/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/src/pytorch/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "<@beartype(magvit2_pytorch.magvit2_pytorch.VideoTokenizer.forward) at 0x7f962217a160>", line 55, in forward
  File "/src/repo/magvit2-pytorch/magvit2_pytorch/magvit2_pytorch.py", line 1826, in forward
    norm_grad_wrt_perceptual_loss = grad_layer_wrt_loss(perceptual_loss, last_dec_layer).norm(p = 2)
  File "/src/pytorch/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "<@beartype(magvit2_pytorch.magvit2_pytorch.grad_layer_wrt_loss) at 0x7f9622152ca0>", line 52, in grad_layer_wrt_loss
  File "/src/repo/magvit2-pytorch/magvit2_pytorch/magvit2_pytorch.py", line 127, in grad_layer_wrt_loss
    return torch_grad(
  File "/src/pytorch/torch/autograd/__init__.py", line 412, in grad
    result = _engine_run_backward(
  File "/src/pytorch/torch/autograd/graph.py", line 744, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: torch_xla/csrc/debug_util.cpp:265 : Check failed: frames.size() >= 1 (1 vs. 0)

Fix a incorrect assert on frame count for PT_XLA_DEBUG=1

4f4187a

JackCaoG requested a review from will-cromar February 5, 2024 03:47

will-cromar approved these changes Feb 21, 2024

View reviewed changes

JackCaoG merged commit c08ae21 into master Feb 21, 2024
18 checks passed

amithrm pushed a commit to amithrm/xla that referenced this pull request Mar 1, 2024

Fix a incorrect assert on frame count for PT_XLA_DEBUG=1 (pytorch#6466)

4017aa3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a incorrect assert on frame count for PT_XLA_DEBUG=1 #6466

Fix a incorrect assert on frame count for PT_XLA_DEBUG=1 #6466

JackCaoG commented Feb 5, 2024

Fix a incorrect assert on frame count for PT_XLA_DEBUG=1 #6466

Fix a incorrect assert on frame count for PT_XLA_DEBUG=1 #6466

Conversation

JackCaoG commented Feb 5, 2024