Replies: 1 comment
-
I forget the combos that resulted in the autograd leaf variable errors, I'll run into them again soon... some other ones The efficientnets seem to break in
I was doing some more hacking around with the LN looking for something that performs reasonably. The codegen on the diff of squared mean norm impl below fails completely (whether scripted in isolation via the decorator like this, or as part of whole model scripting) with an internal compiler error. This option is optimal on TPU w/ PT XLA and can actually be faster than BN (in train) when substituded in a familiar network. No PyTorch eager or torchscript / aot codegen impl of the NCHW LN can come close to that... (usually btw 1/3 to 1/2 the throughput and 2x the memory consumption) Again in train / bwd,
|
Beta Was this translation helpful? Give feedback.
-
Pytorch 1.12 release enables NVFuser by default in TorchScript, and also, combined with functorch 1.12 release, allows users to use aot_autograd compilation to potentially achieve better performance than with torchscript. However, users are experiencing issues when enabling these options, this discussion is to combine nvfuser-related problems in one place.
For a samples of problems encountered see comments in #1340.
cc @csarofeen, @jjsjann123, @Chillee.
Beta Was this translation helpful? Give feedback.
All reactions