[Triton] Triton generated kernel cannot be load correctly thru the L0 API. #659

chengjunlu · 2023-07-19T03:24:29Z

One very large Triton kernel cannot be load correctly thru the L0 API.
Got the error code 0x78000011 from L0 API zeKernelCreate.

ZE_RESULT_ERROR_INVALID_KERNEL_NAME = 0x78000011,   ///< [Validation] kernel name is not found in the module

We double confirmed that the kernel name is used correctly same as the one in the SPIRV IR.

A simple c++ unit test for reproducing this issue.
https://github.com/intel-innersource/frameworks.ai.pytorch.ipex-gpu/tree/chengjun/test_dpcpp
You can use the following command to build and run the test under the root director of the code:

mkdir build
cd ./build/
cmake ../ -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=dpcpp
make all
./test_void_kernel/triton_void_kernel

On ATSM platform result:

root device count: 2
compile kernel on device: Intel(R) Arc(TM) A770 Graphics
create kernel:triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d
L0 API error code:78000011

The text was updated successfully, but these errors were encountered:

silee2 · 2023-07-19T06:28:46Z

The kernel loaded without error on integrated graphics:

root device count: 1
compile kernel on device: Intel(R) Iris(R) Xe Graphics
triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d
create kernel:triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d
compiled kernel ptr: 0x4dc1cd0
total kernels:1
  kernel:triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d @0x4dc1cd0

My configuration

silee2@silee2-mobl:~/Projects/frameworks.ai.pytorch.ipex-gpu/build [chengjun/test_dpcpp|⚑ 3]$ apt list level-zero
Listing... Done
level-zero/now 1.11.0 amd64 [installed,local]
silee2@silee2-mobl:~/Projects/frameworks.ai.pytorch.ipex-gpu/build [chengjun/test_dpcpp|⚑ 3]$ dpcpp --version
icpx: warning: use of 'dpcpp' is deprecated and will be removed in a future release. Use 'icpx -fsycl' [-Wdeprecated]
Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/silee2/intel/oneapi/compiler/2023.1.0/linux/bin-llvm
Configuration file: /home/silee2/intel/oneapi/compiler/2023.1.0/linux/bin-llvm/../bin/icpx.cfg
silee2@silee2-mobl:~/Projects/frameworks.ai.pytorch.ipex-gpu/build [chengjun/test_dpcpp|⚑ 3]$ apt list intel-igc*
Listing... Done
intel-igc-core/now 1.0.14062.11 amd64 [installed,local]
intel-igc-opencl/now 1.0.14062.11 amd64 [installed,local]

iGPU is from i5 11300H [(https://www.intel.com/content/www/us/en/products/sku/196656/intel-core-i511300h-processor-8m-cache-up-to-4-40-ghz-with-ipu/specifications.html)]

chengjunlu · 2023-07-19T06:38:13Z

The case failed on both ATSM and iGPU on Alderlake.

root device count: 2
compile kernel on device: Intel(R) UHD Graphics 770
create kernel:triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d
L0 API error code:78000011

Here is my configuration:

ii  intel-fw-gpu                               2023.12.2+207                           all          Firmware package for Intel integrated and discrete GPUs
ii  intel-gpu-tools                            1.26-2                                  amd64        tools for debugging the Intel graphics driver
ii  intel-i915-dkms                            1.23.4.15.230307.15.5.17.0.1030+i28-1   all          Out of tree i915 driver for Ubuntu oem kernel version 5.17.
ii  intel-igc-cm                               1.0.176+i600~22.04                      amd64        Intel(R) C for Metal Compiler -- CM Frontend lib
ii  intel-level-zero-gpu                       1.3.26032.26-627~22.04                  amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  intel-media-va-driver-non-free:amd64       23.1.6-622~22.04                        amd64        VAAPI driver for the Intel GEN8+ Graphics family
ii  intel-microcode                            3.20230214.0ubuntu0.22.04.1             amd64        Processor microcode firmware for Intel CPUs
ii  intel-opencl-icd                           23.13.26032.26-627~22.04                amd64        Intel graphics compute runtime for OpenCL
ii  intel-platform-cse-dkms                    2023.11.1-36                            amd64        CSE driver
ii  intel-platform-vsec-dkms                   2023.20.0-3                             amd64        Intel Extended Capabilities auxiliary bus driver
ii  libdrm-intel1:amd64                        2.4.113-2~ubuntu0.22.04.1               amd64        Userspace interface to intel-specific kernel DRM services -- runtime
ii  xserver-xorg-video-intel                   2:2.99.917+git20210115-1                amd64        X.Org X server -- Intel i8xx, i9xx display driver

chengjunlu · 2023-07-19T06:41:07Z

@silee2 ,
I find in your log there is the triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d

It means the L0 module has been loaded correctly and we can iterate the kernel in the module.

But in my platform, the L0 module is created without the kernel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Triton] Triton generated kernel cannot be load correctly thru the L0 API. #659

[Triton] Triton generated kernel cannot be load correctly thru the L0 API. #659

chengjunlu commented Jul 19, 2023

silee2 commented Jul 19, 2023 •

edited

Loading

chengjunlu commented Jul 19, 2023 •

edited

Loading

chengjunlu commented Jul 19, 2023

[Triton] Triton generated kernel cannot be load correctly thru the L0 API. #659

[Triton] Triton generated kernel cannot be load correctly thru the L0 API. #659

Comments

chengjunlu commented Jul 19, 2023

silee2 commented Jul 19, 2023 • edited Loading

chengjunlu commented Jul 19, 2023 • edited Loading

chengjunlu commented Jul 19, 2023

silee2 commented Jul 19, 2023 •

edited

Loading

chengjunlu commented Jul 19, 2023 •

edited

Loading