Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Improve reference kernel coverage. #2366

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dyoussif
Copy link
Contributor

@dyoussif dyoussif commented Jan 9, 2025

Addresses https://jira.devtools.intel.com/browse/MFDNN-12444.
Broaden ocl:ref coverage and add some smoke tests for key compute bound primitives.

@dyoussif dyoussif requested review from a team as code owners January 9, 2025 23:36
@github-actions github-actions bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel component:tests Codeowner: @oneapi-src/onednn-arch labels Jan 9, 2025
@dyoussif
Copy link
Contributor Author

dyoussif commented Jan 9, 2025

make test
disable device_cpu
disable run_scans
disable benchdnn_all
enable benchdnn_nightly
enable benchdnn_matmul
enable benchdnn_conv
enable benchdnn_ip
enable benchdnn_deconv
enable arch_xe-lp
enable arch_xe2-lpg
enable arch_xe2-hpg-bmg

Copy link
Contributor

@mgouicem mgouicem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2 cents on targeting specific implementations in our ci coverage: it would increase ci testing scope with no clear value since ref impls should be tested by HW configuration that do not dispatch optimized ones.

For that reason, I would keep --global-impl flag out of the input files, and have it used for development purpose only.

@@ -33,3 +33,19 @@
--attr-post-ops=,linear:2:1
--runtime_dims_masks=1:0,3:3
--batch=shapes_2d_ci

# Decompression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these do not seem to cover decompression as fp-mathmode is not set here. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing this out. Added.

@@ -689,7 +689,7 @@ void def_eltwise_alg_kinds(compute::kernel_ctx_t &kernel_ctx);

bool post_ops_with_binary_ok(const primitive_attr_t *attr,
const data_type_t dst_dt, const int max_ndims_supported = 2,
const int prelu_mask_supported = 3);
const int prelu_mask_supported = 0xFFFF);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the default value just be dropped if it doesnt restrict any value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@dyoussif dyoussif force-pushed the dyoussif/fixup_ref_impl branch from 5d9fc27 to afe7feb Compare January 14, 2025 01:18
@dyoussif
Copy link
Contributor Author

My 2 cents on targeting specific implementations in our ci coverage: it would increase ci testing scope with no clear value since ref impls should be tested by HW configuration that do not dispatch optimized ones.

For that reason, I would keep --global-impl flag out of the input files, and have it used for development purpose only.

This PR was opened in response to an issue where oneDNN was run on HW that was not supported yet and the reference implementation returned unimplemented, ultimately causing a primitive creation failure. The idea here was to introduce some limited testing to have better coverage of ref kernels.

@dyoussif
Copy link
Contributor Author

make test
disable device_cpu
disable run_scans
disable benchdnn_all
enable benchdnn_nightly
enable benchdnn_matmul
enable benchdnn_conv
enable benchdnn_ip
enable benchdnn_deconv
enable arch_xe-lp
enable arch_xe2-lpg
enable arch_xe2-hpg-bmg

@@ -188,6 +188,11 @@ mb128ic3ih230oc64oh112kh7sh2ph0
--dir=BWD_WB
--batch=shapes_mem_strided

# ref smoke test
--global-impl=ref
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzarukin Please take a look at this usage. Is it okay to have --global-impl in batch files? If not, please advise on how to handle this better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to copy-exact the content of the smoke file with a proper comment (if that's the target coverage set) and use --impl in the batch file instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created separate files for ref smoke tests.

@dyoussif dyoussif force-pushed the dyoussif/fixup_ref_impl branch from afe7feb to 2735485 Compare January 14, 2025 22:53
@dyoussif
Copy link
Contributor Author

make test
disable device_cpu
disable run_scans
disable benchdnn_all
enable benchdnn_nightly
enable benchdnn_matmul
enable benchdnn_conv
enable benchdnn_ip
enable benchdnn_deconv
enable arch_xe-lp
enable arch_xe2-lpg
enable arch_xe2-hpg-bmg

@dyoussif dyoussif force-pushed the dyoussif/fixup_ref_impl branch from 2735485 to b1d384c Compare January 15, 2025 17:02
@dyoussif
Copy link
Contributor Author

make test
disable device_cpu
disable run_scans
disable benchdnn_all
enable benchdnn_nightly
enable arch_xe-lp
enable arch_xe2-lpg
enable arch_xe2-hpg-bmg

@mgouicem
Copy link
Contributor

mgouicem commented Jan 15, 2025

This PR was opened in response to an issue where oneDNN was run on HW that was not supported yet and the reference implementation returned unimplemented, ultimately causing a primitive creation failure. The idea here was to introduce some limited testing to have better coverage of ref kernels.

Thanks for the clarification. I wonder if it might be less ressource consuming to add a configuration that would fallback to ref impl in our test matrix vs running reference impl test on all configurations.
Do you know if we can set some environment Level-Zero / NEO environment variable to force our impl to fallback to reference?

@dyoussif dyoussif force-pushed the dyoussif/fixup_ref_impl branch from 921efc6 to fd4feb6 Compare January 17, 2025 21:13
@dyoussif
Copy link
Contributor Author

dyoussif commented Jan 22, 2025

This PR was opened in response to an issue where oneDNN was run on HW that was not supported yet and the reference implementation returned unimplemented, ultimately causing a primitive creation failure. The idea here was to introduce some limited testing to have better coverage of ref kernels.

Thanks for the clarification. I wonder if it might be less ressource consuming to add a configuration that would fallback to ref impl in our test matrix vs running reference impl test on all configurations. Do you know if we can set some environment Level-Zero / NEO environment variable to force our impl to fallback to reference?

It doesn't look like there is some environment variable to do this on the driver side. The smoke tests here are quite small; based off my testing should be about 5-10 mins per primitive to run.

@echeresh
Copy link
Contributor

This PR was opened in response to an issue where oneDNN was run on HW that was not supported yet and the reference implementation returned unimplemented, ultimately causing a primitive creation failure. The idea here was to introduce some limited testing to have better coverage of ref kernels.

Thanks for the clarification. I wonder if it might be less ressource consuming to add a configuration that would fallback to ref impl in our test matrix vs running reference impl test on all configurations. Do you know if we can set some environment Level-Zero / NEO environment variable to force our impl to fallback to reference?

It doesn't look like there is some environment variable to do this on the driver side. The smoke tests here are quite small; based off my testing should be about 5-10 mins per primitive to run.

"smoke tests" sounds like something that could be covered with randomized test cases as well: #2434 + @rjoursler

We may have 100-200 small/medium test cases (that we can call "smoke" or "random") per primitive which can be used to test reference implementations. Once the test generator uniformly supports various features we should have pretty good coverage with very limited amount of testing.

@dyoussif dyoussif force-pushed the dyoussif/fixup_ref_impl branch from fd4feb6 to c5fedda Compare January 23, 2025 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:tests Codeowner: @oneapi-src/onednn-arch platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants