NATTEN Backend

CPU

CPU backend is very limited. It only implements BMM-style NA, and not fused NA.

Naive

Description:

Very simple C++ implementations of BMM-style ops, which enable inference on non-CUDA devices and also serve as a reference for CUDA in unit tests (read more). These implementations are NOT performance optimized.

Dependencies:

libtorch: Torch API used for AVX.

CUDA

Naive

Description:

Originally developed back in 2022, slightly tuned in terms of launch parameters among other factors, but very naive implementations.

Tiled variants: also developed back in 2022, they implement only the PN-2D operation when dimensions per attention head is 32.

Dependencies (excluding CUDA runtime):

libtorch: half atomic add in the RPB backward kernel.

GEMM

Description:

TBD.

Dependencies (excluding CUDA runtime):

CUTLASS

FNA

Description:

TBD.

Dependencies (excluding CUDA runtime):

CUTLASS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backend.md

backend.md

NATTEN Backend

CPU

Naive

Description:

Dependencies:

CUDA

Naive

Description:

Dependencies (excluding CUDA runtime):

GEMM

Description:

Dependencies (excluding CUDA runtime):

FNA

Description:

Dependencies (excluding CUDA runtime):

Files

backend.md

Latest commit

History

backend.md

File metadata and controls

NATTEN Backend

CPU

Naive

Description:

Dependencies:

CUDA

Naive

Description:

Dependencies (excluding CUDA runtime):

GEMM

Description:

Dependencies (excluding CUDA runtime):

FNA

Description:

Dependencies (excluding CUDA runtime):