-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate kernel name via KernelSettings #278
Propagate kernel name via KernelSettings #278
Conversation
…code kernel naming Signed-off-by: Torstein Grindvik <[email protected]>
That's awesome! Had the same problem, but in wgpu. Would it be possible to use the names as a label for the shader on wgpu too? |
Yes, from what I've done in wgpu I know that the entry point can be named by users somewhere in the pipeline setup. Do you have a debugger for wgpu that shows just "main" "main" "main" etc.? Asking because I don't know what the best way of seeing compute shader calls from wpgu is. @ maintainers: If you see this PR and you're happy please go ahead and merge. I'll then do the wgpu in a follow-up as I'm out of time right now. Else I'll continue here next time I'm coding |
pub name: Option<&'static str>, | ||
pub kernel_name: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between name
and kernel_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should aim to have only one name
field for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I didn't do it in the first pass since I wasn't sure where it was used.
I removed the previous name
. That was a full core::any::type_name
name which can't be used as (CUDA, wgpu, etc.) names since it has fully qualified paths and generics etc.
Luckily it was only used for debugging: #38
Your PR there shows the full name:
name: gelu::gelu_array::GeluArray<
cubecl_core::frontend::element::float::F32,
cubecl_cuda::runtime::CudaRuntime,
>
cube_dim: (4, 1, 1)
shared_memory: 0 bytes
source:
extern "C" __global__ void kernel(float input_0[], float output_0[],
uint info[]) {
With this PR that changes to:
name: gelu_array
cube_dim: (1, 1, 1)
shared_memory: 0 bytes
source:
extern "C" __global__ void gelu_array(float_4 input_0[], float_4 output_0[],
uint info[]) {
(I edited the snippets to show the relevant comparison).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is useful to have the fully qualified name in the debug info. In this case we could maybe keep both but name them more explicitly name
and fqn_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Torstein Grindvik <[email protected]>
Signed-off-by: Torstein Grindvik <[email protected]>
Added a commit to make wgpu + wgsl/spirv also use the kernel name as their entrypoint names. |
eedd617
to
2e9b711
Compare
Signed-off-by: Torstein Grindvik <[email protected]>
2e9b711
to
c655a71
Compare
Signed-off-by: Torstein Grindvik <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thanks 🙏
Fixes #234
Names now show up!
Propagating this down from the proc macro until both
crates/cubecl-cpp/src/shared/kernel.rs
as well ascrates/cubecl-cuda/src/compute/server.rs
could get a hold of the name required a bit ofkernel_name
field duplication in the relevant data structures.If there's a more straight forward way to propagate it I'm happy to do that, but I couldn't see any obvious way.
Also notice from the image that the same four names show up twice, but with different performance characteristics.
This is because the example I ran first goes through f32 then f16.
I hope to support automatically adding a suffix in a follow-up if it's not too hairy.