Skip to content

Commit

Permalink
Add some docs
Browse files Browse the repository at this point in the history
  • Loading branch information
nathanielsimard committed Jul 19, 2024
1 parent 5be2f68 commit 50295f2
Show file tree
Hide file tree
Showing 5 changed files with 76 additions and 9 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ You can even ship the autotune cache with your program, reducing cold start time

## Resource

For now we don't have a lot of resources to learn, but you can look at the [linear algebra library](/crates/cubecl-linalg/README.md) to see how CubeCL can be used.
If you have any questions or want to contribute, don't hesitate to join the [Discord](https://discord.gg/uPEBbYYDB6).

## Disclaimer & History
Expand Down
13 changes: 10 additions & 3 deletions crates/cubecl-core/src/ir/kernel.rs
Original file line number Diff line number Diff line change
Expand Up @@ -140,9 +140,16 @@ impl From<Elem> for Item {
impl Display for Elem {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
// NOTE: we'll eventually want to differentiate between int/float types
Self::Float(_) => f.write_str("float"),
Self::Int(_) => f.write_str("int"),
Self::Float(kind) => match kind {
FloatKind::F16 => f.write_str("f16"),
FloatKind::BF16 => f.write_str("bf16"),
FloatKind::F32 => f.write_str("f32"),
FloatKind::F64 => f.write_str("f64"),
},
Self::Int(kind) => match kind {
IntKind::I32 => f.write_str("i32"),
IntKind::I64 => f.write_str("i64"),
},
Self::UInt => f.write_str("uint"),
Self::Bool => f.write_str("bool"),
}
Expand Down
2 changes: 1 addition & 1 deletion crates/cubecl-linalg/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ authors = [
"louisfd <[email protected]>",
]
categories = ["science", "mathematics", "algorithms"]
description = "CubeCL Linear Algebra Components"
description = "CubeCL Linear Algebra Library."
edition.workspace = true
keywords = []
license.workspace = true
Expand Down
60 changes: 60 additions & 0 deletions crates/cubecl-linalg/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# CubeCL Linear Algebra Library.


The crate contains common linear algebra algorithms.

## Algorithms

- [X] Tiling 2D Matrix Multiplication.

The kernel is very flexible and can be used on pretty much any hardware.
- [X] Cooperative Matrix Multiplication.

The kernel is using Automatic Mixed Precision (AMP) to leverage cooperative matrix-multiply and accumulate instructions.
For `f32` tensors, the inputs are casted into `f16`, but the accumulation is still performed in `f32`.
This may cause a small lost in precision, but with way faster execution.

## Benchmarks

You can run the benchmarks from the workspace with the following:

```bash
cargo bench --bench matmul --features wgpu # for wgpu
cargo bench --bench matmul --features cuda # for cuda
```

On an RTX 3070 we get the following results:

```
matmul-wgpu-f32-tiling2d
―――――――― Result ―――――――――
Samples 100
Mean 13.289ms
Variance 28.000ns
Median 13.271ms
Min 12.582ms
Max 13.768ms
―――――――――――――――――――――――――
matmul-cuda-f32-tiling2d
―――――――― Result ―――――――――
Samples 100
Mean 12.754ms
Variance 93.000ns
Median 12.647ms
Min 12.393ms
Max 14.501ms
―――――――――――――――――――――――――
matmul-cuda-f32-cmma
―――――――― Result ―――――――――
Samples 100
Mean 4.996ms
Variance 35.000ns
Median 5.084ms
Min 4.304ms
Max 5.155ms
―――――――――――――――――――――――――
```

9 changes: 4 additions & 5 deletions crates/cubecl/benches/matmul.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ use cubecl::frontend::Float;
use cubecl_linalg::matmul;
use cubecl_linalg::tensor::TensorHandle;

impl<R: Runtime, E: Float> Benchmark for Tiling2dBench<R, E> {
impl<R: Runtime, E: Float> Benchmark for MatmulBench<R, E> {
type Args = (TensorHandle<R, E>, TensorHandle<R, E>, TensorHandle<R, E>);

fn prepare(&self) -> Self::Args {
Expand Down Expand Up @@ -36,8 +36,7 @@ impl<R: Runtime, E: Float> Benchmark for Tiling2dBench<R, E> {
}

fn name(&self) -> String {
let elem = E::as_elem();
format!("tiling2d-{}-{:?}-{:?}", R::name(), elem, self.kind)
format!("matmul-{}-{}-{:?}", R::name(), E::as_elem(), self.kind).to_lowercase()
}

fn sync(&self) {
Expand All @@ -46,7 +45,7 @@ impl<R: Runtime, E: Float> Benchmark for Tiling2dBench<R, E> {
}

#[allow(dead_code)]
struct Tiling2dBench<R: Runtime, E> {
struct MatmulBench<R: Runtime, E> {
b: usize,
m: usize,
k: usize,
Expand All @@ -66,7 +65,7 @@ enum MatmulKind {

#[allow(dead_code)]
fn run<R: Runtime, E: Float>(device: R::Device, kind: MatmulKind) {
let bench = Tiling2dBench::<R, E> {
let bench = MatmulBench::<R, E> {
b: 32,
m: 1024,
k: 1024,
Expand Down

0 comments on commit 50295f2

Please sign in to comment.