Prototype XNNPack gemm compiler. #7569

copybara-service · 2024-12-06T13:36:00Z

Prototype XNNPack gemm compiler.

Our existing GEMM templates are becoming unmaintainable and are preventing us from quickly adding support for new types and quantization schemes. They are also too restrictive in the shapes of the generated GEMMS. This new system generates assembly and the shape is only limited by the number of SIMD registers.

Arch: for example x64 & aarch64
Isa: neondot & avx512f

Each microkernel has an arch and an isa associated with it. All shared scalar code belongs in the arch and isa specific SIMD code belongs to the isa. Isas can inherit from each other. For example, stores are common between avx512f and avx512vnni and neonfma and neondot. This eliminates lots of code duplication.

Only the inner loops (and sometimes the outer loops) vary between GEMM microkernels on the same architecture. Most of the rest of the code is identical. Therefore, this system is modular, with each ISA inheriting from the proceeding one, and only small snippets of assembly are required to add a new ISA.

Architectures supported in initial prototype:
F32: neonfma and avx512f
QD8-F32-QC8W: neondot & avx512vnni

Support for aarch32 will be added in a future change. I do not plan on supporting x86 (32 bit) since it is irrelevant as an architecture and it has only 8 general purpose and SIMD registers. The lack of registers means that data will have to be repeatedly pushed and popped from the stack, adding lots of complexity to the templates for little gain.

The generated assembly only compiles on Linux. However, only the function headers, footers and calling conventions differ between Windows and Linux. The actual assembly is identical. I manually modified the generated assembly and tested it with MSVC for both aarch64 and x64. Support for Windows will be added in a future version.

Intel syntax is used since it is portably between Linux and Windows and it is less crazy than AT&T.

Our existing GEMM templates are becoming unmaintainable and are preventing us from quickly adding support for new types and quantization schemes. They are also too restrictive in the shapes of the generated GEMMS. This new system generates assembly and the shape is only limited by the number of SIMD registers. Arch: for example x64 & aarch64 Isa: neondot & avx512f Each microkernel has an arch and an isa associated with it. All shared scalar code belongs in the arch and isa specific SIMD code belongs to the isa. Isas can inherit from each other. For example, stores are common between avx512f and avx512vnni and neonfma and neondot. This eliminates lots of code duplication. Only the inner loops (and sometimes the outer loops) vary between GEMM microkernels on the same architecture. Most of the rest of the code is identical. Therefore, this system is modular, with each ISA inheriting from the proceeding one, and only small snippets of assembly are required to add a new ISA. Architectures supported in initial prototype: F32: neonfma and avx512f QD8-F32-QC8W: neondot & avx512vnni Support for aarch32 will be added in a future change. I do not plan on supporting x86 (32 bit) since it is irrelevant as an architecture and it has only 8 general purpose and SIMD registers. The lack of registers means that data will have to be repeatedly pushed and popped from the stack, adding lots of complexity to the templates for little gain. The generated assembly only compiles on Linux. However, only the function headers, footers and calling conventions differ between Windows and Linux. The actual assembly is identical. I manually modified the generated assembly and tested it with MSVC for both aarch64 and x64. Support for Windows will be added in a future version. Intel syntax is used since it is portably between Linux and Windows and it is less crazy than AT&T. PiperOrigin-RevId: 702691549

copybara-service bot force-pushed the test_702691549 branch 2 times, most recently from 57eacac to c96c59f Compare December 6, 2024 16:12

copybara-service bot force-pushed the test_702691549 branch from c96c59f to 2ed701d Compare December 11, 2024 06:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype XNNPack gemm compiler. #7569

Prototype XNNPack gemm compiler. #7569

copybara-service bot commented Dec 6, 2024

Prototype XNNPack gemm compiler. #7569

Are you sure you want to change the base?

Prototype XNNPack gemm compiler. #7569

Conversation

copybara-service bot commented Dec 6, 2024