Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verification results differ across vendors' GPUs #16636

Open
jinz2014 opened this issue Jan 14, 2025 · 2 comments
Open

Verification results differ across vendors' GPUs #16636

jinz2014 opened this issue Jan 14, 2025 · 2 comments
Labels
bug Something isn't working confirmed cuda CUDA back-end hip Issues related to execution on HIP backend.

Comments

@jinz2014
Copy link
Contributor

icpx 2025.0 with NVIDIA/AMD plugins:

The verification of the SYCL program in https://github.com/zjin-lcf/HeCBench/tree/master/src/quantVLLM-sycl may show some issues.

Intel Max GPU:

./main 4096 5137 1000
Input type is FP16
PASS
Input type is BF16
FAIL
Input type is FP32
PASS

NVIDIA/AMD GPU:
FAIL for three data types

The CUDA and HIP programs run successfully on the NVIDIA and AMD GPUs.

@dm-vodopyanov
Copy link
Contributor

dm-vodopyanov commented Jan 15, 2025

Hi @jinz2014, thanks for the report. Could you please also attach sycl-ls --verbose output?

Reproduced but for another Intel GPU got different results:

./a.out 4096 5137 1000
Input type is FP16
Average execution time of static_scaled_int8_quant kernel: 1882.331787 (us)
Average execution time of static_scaled_int8_quant_azp kernel: 463.766968 (us)
Average execution time of dynamic_scaled_int8_quant kernel: 137.035599 (us)
Average execution time of dynamic_scaled_int8_quant_azp kernel: 756.085266 (us)
FAIL
Input type is BF16
Average execution time of static_scaled_int8_quant kernel: 629.392334 (us)
Average execution time of static_scaled_int8_quant_azp kernel: 423.225311 (us)
Average execution time of dynamic_scaled_int8_quant kernel: 131.694229 (us)
Average execution time of dynamic_scaled_int8_quant_azp kernel: 708.030212 (us)
FAIL
Input type is FP32
Average execution time of static_scaled_int8_quant kernel: 258.933594 (us)
Average execution time of static_scaled_int8_quant_azp kernel: 263.415619 (us)
Average execution time of dynamic_scaled_int8_quant kernel: 142.391510 (us)
Average execution time of dynamic_scaled_int8_quant_azp kernel: 343.215698 (us)
PASS

@dm-vodopyanov dm-vodopyanov added cuda CUDA back-end hip Issues related to execution on HIP backend. confirmed bug Something isn't working Need info Some clarifications are needed from the reporter labels Jan 15, 2025
@jinz2014
Copy link
Contributor Author

I assume that the ported program in SYCL matches the CUDA/HIP programs. I tried Syclomatic (Intel(R) DPC++ Compatibility Tool version 2025.0.0), but building the migrated files was not successful.

[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Xeon(R) Silver 4410T OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1100 OpenCL 3.0 NEO [23.17.26241.33]

Platforms: 2
Platform [#1]:
Version : OpenCL 3.0 LINUX
Name : Intel(R) OpenCL
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : cpu
Version : OpenCL 3.0 (Build 0)
Name : Intel(R) Xeon(R) Silver 4410T
Vendor : Intel(R) Corporation
Driver : 2024.18.12.0.05_160000
Num SubDevices : 2
Num SubSubDevices : 0
Aspects : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_oneapi_srgb ext_oneapi_native_assert ext_intel_legacy_image ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_intel_matrix ext_oneapi_private_alloca
info::device::sub_group_sizes: 4 8 16 32 64
Architecture: intel_cpu_spr
Platform [#2]:
Version : OpenCL 3.0
Name : Intel(R) OpenCL Graphics
Vendor : Intel(R) Corporation
Devices : 1
Device [#1]:
Type : gpu
Version : OpenCL 3.0 NEO
Name : Intel(R) Data Center GPU Max 1100
Vendor : Intel(R) Corporation
Driver : 23.17.26241.33
UUID : 1341282181147000440000000
Num SubDevices : 0
Num SubSubDevices : 0
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_srgb ext_intel_device_id ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_intel_matrix ext_oneapi_private_alloca
info::device::sub_group_sizes: 16 32
Architecture: intel_gpu_pvc
default_selector() : gpu, Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1100 OpenCL 3.0 NEO [23.17.26241.33]
accelerator_selector() : No device of requested type available. Please chec...
cpu_selector() : cpu, Intel(R) OpenCL, Intel(R) Xeon(R) Silver 4410T OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]
gpu_selector() : gpu, Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1100 OpenCL 3.0 NEO [23.17.26241.33]
custom_selector(gpu) : gpu, Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1100 OpenCL 3.0 NEO [23.17.26241.33]
custom_selector(cpu) : cpu, Intel(R) OpenCL, Intel(R) Xeon(R) Silver 4410T OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]

@dm-vodopyanov dm-vodopyanov removed the Need info Some clarifications are needed from the reporter label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed cuda CUDA back-end hip Issues related to execution on HIP backend.
Projects
None yet
Development

No branches or pull requests

2 participants