Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Triton] Triton generated kernel cannot be load correctly thru the L0 API. #659

Open
chengjunlu opened this issue Jul 19, 2023 · 3 comments

Comments

@chengjunlu
Copy link

One very large Triton kernel cannot be load correctly thru the L0 API.
Got the error code 0x78000011 from L0 API zeKernelCreate.

ZE_RESULT_ERROR_INVALID_KERNEL_NAME = 0x78000011,   ///< [Validation] kernel name is not found in the module

We double confirmed that the kernel name is used correctly same as the one in the SPIRV IR.

A simple c++ unit test for reproducing this issue.
https://github.com/intel-innersource/frameworks.ai.pytorch.ipex-gpu/tree/chengjun/test_dpcpp
You can use the following command to build and run the test under the root director of the code:

mkdir build
cd ./build/
cmake ../ -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=dpcpp
make all
./test_void_kernel/triton_void_kernel

On ATSM platform result:

root device count: 2
compile kernel on device: Intel(R) Arc(TM) A770 Graphics
create kernel:triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d
L0 API error code:78000011
@silee2
Copy link
Contributor

silee2 commented Jul 19, 2023

The kernel loaded without error on integrated graphics:

root device count: 1
compile kernel on device: Intel(R) Iris(R) Xe Graphics
triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d
create kernel:triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d
compiled kernel ptr: 0x4dc1cd0
total kernels:1
  kernel:triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d @0x4dc1cd0

My configuration

silee2@silee2-mobl:~/Projects/frameworks.ai.pytorch.ipex-gpu/build [chengjun/test_dpcpp|⚑ 3]$ apt list level-zero
Listing... Done
level-zero/now 1.11.0 amd64 [installed,local]
silee2@silee2-mobl:~/Projects/frameworks.ai.pytorch.ipex-gpu/build [chengjun/test_dpcpp|⚑ 3]$ dpcpp --version
icpx: warning: use of 'dpcpp' is deprecated and will be removed in a future release. Use 'icpx -fsycl' [-Wdeprecated]
Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/silee2/intel/oneapi/compiler/2023.1.0/linux/bin-llvm
Configuration file: /home/silee2/intel/oneapi/compiler/2023.1.0/linux/bin-llvm/../bin/icpx.cfg
silee2@silee2-mobl:~/Projects/frameworks.ai.pytorch.ipex-gpu/build [chengjun/test_dpcpp|⚑ 3]$ apt list intel-igc*
Listing... Done
intel-igc-core/now 1.0.14062.11 amd64 [installed,local]
intel-igc-opencl/now 1.0.14062.11 amd64 [installed,local]

iGPU is from i5 11300H [(https://www.intel.com/content/www/us/en/products/sku/196656/intel-core-i511300h-processor-8m-cache-up-to-4-40-ghz-with-ipu/specifications.html)]

@chengjunlu
Copy link
Author

chengjunlu commented Jul 19, 2023

The case failed on both ATSM and iGPU on Alderlake.

root device count: 2
compile kernel on device: Intel(R) UHD Graphics 770
create kernel:triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d
L0 API error code:78000011

Here is my configuration:

ii  intel-fw-gpu                               2023.12.2+207                           all          Firmware package for Intel integrated and discrete GPUs
ii  intel-gpu-tools                            1.26-2                                  amd64        tools for debugging the Intel graphics driver
ii  intel-i915-dkms                            1.23.4.15.230307.15.5.17.0.1030+i28-1   all          Out of tree i915 driver for Ubuntu oem kernel version 5.17.
ii  intel-igc-cm                               1.0.176+i600~22.04                      amd64        Intel(R) C for Metal Compiler -- CM Frontend lib
ii  intel-level-zero-gpu                       1.3.26032.26-627~22.04                  amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  intel-media-va-driver-non-free:amd64       23.1.6-622~22.04                        amd64        VAAPI driver for the Intel GEN8+ Graphics family
ii  intel-microcode                            3.20230214.0ubuntu0.22.04.1             amd64        Processor microcode firmware for Intel CPUs
ii  intel-opencl-icd                           23.13.26032.26-627~22.04                amd64        Intel graphics compute runtime for OpenCL
ii  intel-platform-cse-dkms                    2023.11.1-36                            amd64        CSE driver
ii  intel-platform-vsec-dkms                   2023.20.0-3                             amd64        Intel Extended Capabilities auxiliary bus driver
ii  libdrm-intel1:amd64                        2.4.113-2~ubuntu0.22.04.1               amd64        Userspace interface to intel-specific kernel DRM services -- runtime
ii  xserver-xorg-video-intel                   2:2.99.917+git20210115-1                amd64        X.Org X server -- Intel i8xx, i9xx display driver

@chengjunlu
Copy link
Author

@silee2 ,
I find in your log there is the triton__0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15d16d17d18d19d20d21d22d23d24d25d26d27d28d29d30d31d32d33d34d35d36d37d38d39d40d41d42d43d44d45d46d47d48d49d50d51d52d53d54d55d56d57d58d59d60d61d62d63d64d65d66d67d68d69d70d71d72d73d74d75d76d77d78d79d80d81d82d83d84d85d86d

It means the L0 module has been loaded correctly and we can iterate the kernel in the module.

But in my platform, the L0 module is created without the kernel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants