v1.2
Release Notes – Release 1.2.0
Key Features and Enhancements
- [pyTorch] Sliding window support is added for DotProductAttention.
- [pyTorch] Performance of DotProductAttention is increased on Hopper GPUs by utilizing cuDNN.
- [pyTorch] Support for the Falcon architecture is added in TransformerLayer via the new option
parallel_attention_mlp
. - [pyTorch] Checkpointing logic when using
fp8_model_init
is improved. - [JAX] Support is added for controlling SM margin in LayerNorm and RMSNorm kernel via environment variables
NVTE_FWD_LAYERNORM_SM_MARGIN
andNVTE_BWD_LAYERNORM_SM_MARGIN
.
Fixed Issues
- Weight gradient could be computed incorrectly in some cases when FP8 execution and sequence parallelism were used together.
- Statistics were computed incorrectly during FP8 calibration.
- Using torch.compile on DotProductAttention module caused a crash.
- Rotary embeddings during pipeline-parallel inference did not operate correctly.
- Incorrect mask type used by the decoder in encoder-decoder architectures.
- Exporting Transformer Engine modules to ONNX in recent versions of pyTorch did not work correctly.
Known Issues in This Release
- FlashAttention v2, which is a dependency of this release of Transformer Engine, has a known issue with excessive memory usage during installation (Dao-AILab/flash-attention#358). You can work around this issue either by setting the environment variable
MAX_JOBS=1
during Transformer Engine installation, or by installing FlashAttention v1 (e.g. by runningpip install flash-attn==1.0.9
) before attempting to install Transformer Engine. - [pyTorch] FlashAttention v2.1 changed the behavior of the causal mask when performing cross-attention. (See https://github.com/Dao-AILab/flash-attention#21-change-behavior-of-causal-flag for reference.) To keep Transformer Engine behavior consistent between versions and backends, FlashAttention is disabled for this use case (cross attention with casual masking) when 2.1+ version of FlashAttention is installed.
Breaking Changes in This Release
There are no breaking changes in this release.
Deprecated Features
There are no deprecated features in this release.