We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looks like subgroups are going to land imminently on wgpu as per here: gfx-rs/wgpu#5301
wgpu
Integrating these should be fun. Few thoughts on what should be done here:
native
web
References: https://fleetwood.dev/posts/layernorm-as-fast-as-possible https://github.com/FL33TW00D/wgpu-bench/blob/master/kernels/layernorm/welford_vec4.wgsl#L47
The text was updated successfully, but these errors were encountered:
In addition to reduce ops, Subgroups can also be useful for warp aware compute like warptiling
Sorry, something went wrong.
Yeah I think the reduction would do nicely in the new GEMV kernels π₯
Completed in #220
No branches or pull requests
Looks like subgroups are going to land imminently on
wgpu
as per here: gfx-rs/wgpu#5301Integrating these should be fun. Few thoughts on what should be done here:
native
vsweb
feature flag, since subgroups haven't shipped on web yet.References:
https://fleetwood.dev/posts/layernorm-as-fast-as-possible
https://github.com/FL33TW00D/wgpu-bench/blob/master/kernels/layernorm/welford_vec4.wgsl#L47
The text was updated successfully, but these errors were encountered: