Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster CUDA prompt speeds #925

Merged
merged 3 commits into from
Nov 21, 2024
Merged

Faster CUDA prompt speeds #925

merged 3 commits into from
Nov 21, 2024

Conversation

EricLBuehler
Copy link
Owner

@EricLBuehler EricLBuehler commented Nov 21, 2024

I measure +4% PP for Llama 3.2 3b (807 T/s -> 840 T/s, 42 prompt tokens) without quantization.

Copy link

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           41           22           10            9
 Happy                   1          442          369            0           73
 JSON                   12          105          104            0            1
 Python                 53         2274         1949           63          262
 Shell                   1           57           22           18           17
 TOML                   18          583          520            2           61
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          196          169            1           26
 (Total)                            273          201           32           40
-------------------------------------------------------------------------------
 Markdown               40         3009            0         2286          723
 |- BASH                 6          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               6          114          102            0           12
 |- Rust                10          361          306            0           55
 |- TOML                 2           75           63            0           12
 (Total)                           3672          581         2286          805
-------------------------------------------------------------------------------
 Rust                  280        84887        76163         1764         6960
 |- Markdown           136         1435           25         1306          104
 (Total)                          86322        76188         3070         7064
===============================================================================
 Total                 415        91454        79196         4145         8113
===============================================================================
  

@EricLBuehler EricLBuehler merged commit 2b20951 into master Nov 21, 2024
12 checks passed
@EricLBuehler EricLBuehler deleted the faster_cuda_attnmask branch November 21, 2024 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant