Refactor/cmma generalize #94

louisfd · 2024-09-04T19:25:33Z

Generalized the cmma algorithm to allow

Any vectorization factor (Fix Matmul CMMA: support other vectorizations (or none) #12 )
Varying block sizes

I reduced the complexity of all the different parameters (block sizes, cube dims and other comptime info) to two degrees of liberty only: one must choose B_MN which is both block size for m and n, and B_K (block size for k), with the additional constraints that these parameters must be divisible by the tile size, and that B_MN must be divisible by B_K. This allows for only a few combinations of B_MN, B_K.

The combination (32, 32) weirdly adds 0s if tensors are not small, so I ignored it. I don't intend to fix it in the short term because it's not likely an important combination.

The combination (128, 16) creates a 128x128 shared memory at the write output stage. With the other shared memories this will bust most SMEMs. I will work on removing the need for a final shared memory, as mentioned in #15 .

In the short term I will also work on #13

…_generalize

…l into refactor/cmma_generalize

louisfd · 2024-09-06T20:25:43Z

See #101 instead

louisfd and others added 30 commits August 20, 2024 15:03

minor refactor

b7064e1

change accumulators for sequence

9527d3f

add failing test

eb22cda

Merge branch 'main' into refactor/cmma_generalize

e9d473d

wip

72b3f89

:wq Merge branch 'main' of github.com:tracel-ai/cubecl

700c0cf

Merge branch 'main' into refactor/cmma_generalize

55b5fd3

wip

ec83d3c

wip

7a4f3e4

wip

5ed5caa

wip

f8aa418

wip

335e4c2

wip

6d20a18

wip

da02986

coop and lane independant from unit pos

9e917d5

custom block size

561f71c

num accumulators

9a6fc84

fix k loop test

6dbf866

allowing any config wip

3aacdf6

merge

c55dd64

generalize fragment to sm

b6d778d

Merge branch 'main' of github.com:tracel-ai/cubecl into refactor/cmma…

e37d9cd

…_generalize

Merge branch 'refactor/cmma_generalize' of github.com:tracel-ai/cubec…

c7abc89

…l into refactor/cmma_generalize

sm max in bytes

5831bd1

wip

730e190

Merge branch 'refactor/cmma_generalize' of github.com:tracel-ai/cubec…

644b4ea

…l into refactor/cmma_generalize

add index of error

0349242

refactor load and write tests

def320f

refactor compute loop test

99dc7dc

Merge branch 'refactor/cmma_generalize' of github.com:tracel-ai/cubec…

f71a959

…l into refactor/cmma_generalize

louisfd and others added 19 commits August 30, 2024 12:03

Merge branch 'main' of github.com:tracel-ai/cubecl

bfca4fa

Merge branch 'main' into refactor/cmma_generalize

daa39f5

add vec1

829d50f

vec tests

0f6b146

unhardcode

9772a3a

wip refactor only two degrees of liberty

a73158d

block config

e9cbeca

add tests

3a531d8

testing alternate block sizes

e1ea5e0

fix write

348953f

played with tests

1f2b62f

ignore failing test

668ba03

Merge branch 'main' of github.com:tracel-ai/cubecl

ef0e746

Merge branch 'main' into refactor/cmma_generalize

c343f66

fmt

49683cb

fix

acb9285

back to using unit pos directly

18a115a

refactor vec

a0db0e6

fix equation

2f64b5d

louisfd closed this Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/cmma generalize #94

Refactor/cmma generalize #94

louisfd commented Sep 4, 2024

louisfd commented Sep 6, 2024

Refactor/cmma generalize #94

Refactor/cmma generalize #94

Conversation

louisfd commented Sep 4, 2024

louisfd commented Sep 6, 2024