-
Notifications
You must be signed in to change notification settings - Fork 44
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5be2f68
commit 50295f2
Showing
5 changed files
with
76 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ authors = [ | |
"louisfd <[email protected]>", | ||
] | ||
categories = ["science", "mathematics", "algorithms"] | ||
description = "CubeCL Linear Algebra Components" | ||
description = "CubeCL Linear Algebra Library." | ||
edition.workspace = true | ||
keywords = [] | ||
license.workspace = true | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# CubeCL Linear Algebra Library. | ||
|
||
|
||
The crate contains common linear algebra algorithms. | ||
|
||
## Algorithms | ||
|
||
- [X] Tiling 2D Matrix Multiplication. | ||
|
||
The kernel is very flexible and can be used on pretty much any hardware. | ||
- [X] Cooperative Matrix Multiplication. | ||
|
||
The kernel is using Automatic Mixed Precision (AMP) to leverage cooperative matrix-multiply and accumulate instructions. | ||
For `f32` tensors, the inputs are casted into `f16`, but the accumulation is still performed in `f32`. | ||
This may cause a small lost in precision, but with way faster execution. | ||
|
||
## Benchmarks | ||
|
||
You can run the benchmarks from the workspace with the following: | ||
|
||
```bash | ||
cargo bench --bench matmul --features wgpu # for wgpu | ||
cargo bench --bench matmul --features cuda # for cuda | ||
``` | ||
|
||
On an RTX 3070 we get the following results: | ||
|
||
``` | ||
matmul-wgpu-f32-tiling2d | ||
―――――――― Result ――――――――― | ||
Samples 100 | ||
Mean 13.289ms | ||
Variance 28.000ns | ||
Median 13.271ms | ||
Min 12.582ms | ||
Max 13.768ms | ||
――――――――――――――――――――――――― | ||
matmul-cuda-f32-tiling2d | ||
―――――――― Result ――――――――― | ||
Samples 100 | ||
Mean 12.754ms | ||
Variance 93.000ns | ||
Median 12.647ms | ||
Min 12.393ms | ||
Max 14.501ms | ||
――――――――――――――――――――――――― | ||
matmul-cuda-f32-cmma | ||
―――――――― Result ――――――――― | ||
Samples 100 | ||
Mean 4.996ms | ||
Variance 35.000ns | ||
Median 5.084ms | ||
Min 4.304ms | ||
Max 5.155ms | ||
――――――――――――――――――――――――― | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters