[GPU] Improve reference kernel coverage. #2366

dyoussif · 2025-01-09T23:36:08Z

Addresses https://jira.devtools.intel.com/browse/MFDNN-12444.
Broaden ocl:ref coverage and add some smoke tests for key compute bound primitives.

dyoussif · 2025-01-09T23:42:20Z

make test
disable device_cpu
disable run_scans
disable benchdnn_all
enable benchdnn_nightly
enable benchdnn_matmul
enable benchdnn_conv
enable benchdnn_ip
enable benchdnn_deconv
enable arch_xe-lp
enable arch_xe2-lpg
enable arch_xe2-hpg-bmg

mgouicem

My 2 cents on targeting specific implementations in our ci coverage: it would increase ci testing scope with no clear value since ref impls should be tested by HW configuration that do not dispatch optimized ones.

For that reason, I would keep --global-impl flag out of the input files, and have it used for development purpose only.

mgouicem · 2025-01-10T10:18:05Z

tests/benchdnn/inputs/matmul/test_matmul_smoke

@@ -33,3 +33,19 @@
 --attr-post-ops=,linear:2:1
 --runtime_dims_masks=1:0,3:3
 --batch=shapes_2d_ci
+
+# Decompression


these do not seem to cover decompression as fp-mathmode is not set here. Am I missing something?

thanks for pointing this out. Added.

kealan-barbieri · 2025-01-10T17:27:40Z

src/gpu/intel/primitive_conf.hpp

@@ -689,7 +689,7 @@ void def_eltwise_alg_kinds(compute::kernel_ctx_t &kernel_ctx);

 bool post_ops_with_binary_ok(const primitive_attr_t *attr,
        const data_type_t dst_dt, const int max_ndims_supported = 2,
-        const int prelu_mask_supported = 3);
+        const int prelu_mask_supported = 0xFFFF);


should the default value just be dropped if it doesnt restrict any value?

dyoussif · 2025-01-14T01:18:28Z

My 2 cents on targeting specific implementations in our ci coverage: it would increase ci testing scope with no clear value since ref impls should be tested by HW configuration that do not dispatch optimized ones.

For that reason, I would keep --global-impl flag out of the input files, and have it used for development purpose only.

This PR was opened in response to an issue where oneDNN was run on HW that was not supported yet and the reference implementation returned unimplemented, ultimately causing a primitive creation failure. The idea here was to introduce some limited testing to have better coverage of ref kernels.

dyoussif · 2025-01-14T16:31:57Z

make test
disable device_cpu
disable run_scans
disable benchdnn_all
enable benchdnn_nightly
enable benchdnn_matmul
enable benchdnn_conv
enable benchdnn_ip
enable benchdnn_deconv
enable arch_xe-lp
enable arch_xe2-lpg
enable arch_xe2-hpg-bmg

echeresh · 2025-01-14T18:51:23Z

tests/benchdnn/inputs/conv/test_conv_gpu

@@ -188,6 +188,11 @@ mb128ic3ih230oc64oh112kh7sh2ph0
 --dir=BWD_WB
 --batch=shapes_mem_strided

+# ref smoke test
+--global-impl=ref


@dzarukin Please take a look at this usage. Is it okay to have --global-impl in batch files? If not, please advise on how to handle this better.

I would suggest to copy-exact the content of the smoke file with a proper comment (if that's the target coverage set) and use --impl in the batch file instead.

I created separate files for ref smoke tests.

dyoussif · 2025-01-14T22:56:58Z

make test
disable device_cpu
disable run_scans
disable benchdnn_all
enable benchdnn_nightly
enable benchdnn_matmul
enable benchdnn_conv
enable benchdnn_ip
enable benchdnn_deconv
enable arch_xe-lp
enable arch_xe2-lpg
enable arch_xe2-hpg-bmg

dyoussif · 2025-01-15T18:49:37Z

make test
disable device_cpu
disable run_scans
disable benchdnn_all
enable benchdnn_nightly
enable arch_xe-lp
enable arch_xe2-lpg
enable arch_xe2-hpg-bmg

mgouicem · 2025-01-15T19:34:42Z

This PR was opened in response to an issue where oneDNN was run on HW that was not supported yet and the reference implementation returned unimplemented, ultimately causing a primitive creation failure. The idea here was to introduce some limited testing to have better coverage of ref kernels.

Thanks for the clarification. I wonder if it might be less ressource consuming to add a configuration that would fallback to ref impl in our test matrix vs running reference impl test on all configurations.
Do you know if we can set some environment Level-Zero / NEO environment variable to force our impl to fallback to reference?

dyoussif · 2025-01-22T19:28:55Z

This PR was opened in response to an issue where oneDNN was run on HW that was not supported yet and the reference implementation returned unimplemented, ultimately causing a primitive creation failure. The idea here was to introduce some limited testing to have better coverage of ref kernels.

Thanks for the clarification. I wonder if it might be less ressource consuming to add a configuration that would fallback to ref impl in our test matrix vs running reference impl test on all configurations. Do you know if we can set some environment Level-Zero / NEO environment variable to force our impl to fallback to reference?

It doesn't look like there is some environment variable to do this on the driver side. The smoke tests here are quite small; based off my testing should be about 5-10 mins per primitive to run.

echeresh · 2025-01-22T22:28:36Z

This PR was opened in response to an issue where oneDNN was run on HW that was not supported yet and the reference implementation returned unimplemented, ultimately causing a primitive creation failure. The idea here was to introduce some limited testing to have better coverage of ref kernels.

Thanks for the clarification. I wonder if it might be less ressource consuming to add a configuration that would fallback to ref impl in our test matrix vs running reference impl test on all configurations. Do you know if we can set some environment Level-Zero / NEO environment variable to force our impl to fallback to reference?

It doesn't look like there is some environment variable to do this on the driver side. The smoke tests here are quite small; based off my testing should be about 5-10 mins per primitive to run.

"smoke tests" sounds like something that could be covered with randomized test cases as well: #2434 + @rjoursler

We may have 100-200 small/medium test cases (that we can call "smoke" or "random") per primitive which can be used to test reference implementations. Once the test generator uniformly supports various features we should have pretty good coverage with very limited amount of testing.

dyoussif requested review from a team as code owners January 9, 2025 23:36

github-actions bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel component:tests Codeowner: @oneapi-src/onednn-arch labels Jan 9, 2025

mgouicem reviewed Jan 10, 2025

View reviewed changes

kealan-barbieri reviewed Jan 10, 2025

View reviewed changes

kealan-barbieri approved these changes Jan 10, 2025

View reviewed changes

dyoussif force-pushed the dyoussif/fixup_ref_impl branch from 5d9fc27 to afe7feb Compare January 14, 2025 01:18

echeresh reviewed Jan 14, 2025

View reviewed changes

dyoussif force-pushed the dyoussif/fixup_ref_impl branch from afe7feb to 2735485 Compare January 14, 2025 22:53

dyoussif force-pushed the dyoussif/fixup_ref_impl branch from 2735485 to b1d384c Compare January 15, 2025 17:02

dyoussif force-pushed the dyoussif/fixup_ref_impl branch from 921efc6 to fd4feb6 Compare January 17, 2025 21:13

echeresh approved these changes Jan 22, 2025

View reviewed changes

dyoussif added 3 commits January 23, 2025 13:25

xe: ocl: fixup ref impl coverage

37327e6

tests: benchdnn: gpu: add ref impl smoke tests

42d094d

xe: ocl: add support for zp in sum post op

c5fedda

dyoussif force-pushed the dyoussif/fixup_ref_impl branch from fd4feb6 to c5fedda Compare January 23, 2025 21:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Improve reference kernel coverage. #2366

[GPU] Improve reference kernel coverage. #2366

dyoussif commented Jan 9, 2025

dyoussif commented Jan 9, 2025

mgouicem left a comment

mgouicem Jan 10, 2025

dyoussif Jan 14, 2025

kealan-barbieri Jan 10, 2025

dyoussif Jan 14, 2025

dyoussif commented Jan 14, 2025

dyoussif commented Jan 14, 2025

echeresh Jan 14, 2025

dzarukin Jan 14, 2025

dyoussif Jan 14, 2025

dyoussif commented Jan 14, 2025

dyoussif commented Jan 15, 2025

mgouicem commented Jan 15, 2025 •

edited

Loading

dyoussif commented Jan 22, 2025 •

edited

Loading

echeresh commented Jan 22, 2025

[GPU] Improve reference kernel coverage. #2366

Are you sure you want to change the base?

[GPU] Improve reference kernel coverage. #2366

Conversation

dyoussif commented Jan 9, 2025

dyoussif commented Jan 9, 2025

mgouicem left a comment

Choose a reason for hiding this comment

mgouicem Jan 10, 2025

Choose a reason for hiding this comment

dyoussif Jan 14, 2025

Choose a reason for hiding this comment

kealan-barbieri Jan 10, 2025

Choose a reason for hiding this comment

dyoussif Jan 14, 2025

Choose a reason for hiding this comment

dyoussif commented Jan 14, 2025

dyoussif commented Jan 14, 2025

echeresh Jan 14, 2025

Choose a reason for hiding this comment

dzarukin Jan 14, 2025

Choose a reason for hiding this comment

dyoussif Jan 14, 2025

Choose a reason for hiding this comment

dyoussif commented Jan 14, 2025

dyoussif commented Jan 15, 2025

mgouicem commented Jan 15, 2025 • edited Loading

dyoussif commented Jan 22, 2025 • edited Loading

echeresh commented Jan 22, 2025

mgouicem commented Jan 15, 2025 •

edited

Loading

dyoussif commented Jan 22, 2025 •

edited

Loading