CB: support different number of K and V heads per layer (#1610) #20
Job | Run time |
---|---|
0s | |
0s | |
0s | |
18m 35s | |
0s | |
19m 59s | |
35m 12s | |
0s | |
0s | |
0s | |
13m 38s | |
20m 39s | |
0s | |
26m 53s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
2h 14m 56s |
Job | Run time |
---|---|
0s | |
0s | |
0s | |
18m 35s | |
0s | |
19m 59s | |
35m 12s | |
0s | |
0s | |
0s | |
13m 38s | |
20m 39s | |
0s | |
26m 53s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
0s | |
2h 14m 56s |