Skip to content

Latest commit

 

History

History
294 lines (272 loc) · 26.5 KB

README.md

File metadata and controls

294 lines (272 loc) · 26.5 KB

Experiment Results

Pixart - Batch Size: 1

dtype: fp16, qtype: fp8, qte: 1, fuse: 0
dtype: fp16, qtype: fp8, qte: 1, fuse: 0
dtype: fp16, qtype: none, qte: 0, fuse: 0
dtype: fp16, qtype: none, qte: 0, fuse: 0
dtype: fp16, qtype: none, qte: 0, fuse: 1
dtype: fp16, qtype: none, qte: 0, fuse: 1
dtype: fp16, qtype: fp8, qte: 1, fuse: 1
dtype: fp16, qtype: fp8, qte: 1, fuse: 1
dtype: bf16, qtype: int4, qte: 0, fuse: 1
dtype: bf16, qtype: int4, qte: 0, fuse: 1
dtype: bf16, qtype: none, qte: 0, fuse: 0
dtype: bf16, qtype: none, qte: 0, fuse: 0
dtype: bf16, qtype: none, qte: 0, fuse: 1
dtype: bf16, qtype: none, qte: 0, fuse: 1
dtype: bf16, qtype: int4, qte: 0, fuse: 0
dtype: bf16, qtype: int4, qte: 0, fuse: 0
dtype: fp16, qtype: int8, qte: 1, fuse: 0
dtype: fp16, qtype: int8, qte: 1, fuse: 0
dtype: fp16, qtype: int8, qte: 1, fuse: 1
dtype: fp16, qtype: int8, qte: 1, fuse: 1
dtype: bf16, qtype: fp8, qte: 1, fuse: 0
dtype: bf16, qtype: fp8, qte: 1, fuse: 0
dtype: bf16, qtype: fp8, qte: 1, fuse: 1
dtype: bf16, qtype: fp8, qte: 1, fuse: 1
dtype: bf16, qtype: int8, qte: 1, fuse: 0
dtype: bf16, qtype: int8, qte: 1, fuse: 0
dtype: bf16, qtype: int8, qte: 1, fuse: 1
dtype: bf16, qtype: int8, qte: 1, fuse: 1
dtype: bf16, qtype: int4, qte: 1, fuse: 1
dtype: bf16, qtype: int4, qte: 1, fuse: 1
dtype: bf16, qtype: none, qte: 1, fuse: 0
dtype: bf16, qtype: none, qte: 1, fuse: 0
dtype: bf16, qtype: none, qte: 1, fuse: 1
dtype: bf16, qtype: none, qte: 1, fuse: 1
dtype: bf16, qtype: int4, qte: 1, fuse: 0
dtype: bf16, qtype: int4, qte: 1, fuse: 0
dtype: fp16, qtype: none, qte: 1, fuse: 0
dtype: fp16, qtype: none, qte: 1, fuse: 0
dtype: fp16, qtype: fp8, qte: 0, fuse: 0
dtype: fp16, qtype: fp8, qte: 0, fuse: 0
dtype: fp16, qtype: fp8, qte: 0, fuse: 1
dtype: fp16, qtype: fp8, qte: 0, fuse: 1
dtype: fp16, qtype: none, qte: 1, fuse: 1
dtype: fp16, qtype: none, qte: 1, fuse: 1
dtype: bf16, qtype: fp8, qte: 0, fuse: 0
dtype: bf16, qtype: fp8, qte: 0, fuse: 0
dtype: bf16, qtype: fp8, qte: 0, fuse: 1
dtype: bf16, qtype: fp8, qte: 0, fuse: 1
dtype: bf16, qtype: int8, qte: 0, fuse: 0
dtype: bf16, qtype: int8, qte: 0, fuse: 0
dtype: bf16, qtype: int8, qte: 0, fuse: 1
dtype: bf16, qtype: int8, qte: 0, fuse: 1
dtype: fp16, qtype: int8, qte: 0, fuse: 0
dtype: fp16, qtype: int8, qte: 0, fuse: 0
dtype: fp16, qtype: int8, qte: 0, fuse: 1
dtype: fp16, qtype: int8, qte: 0, fuse: 1

Sorted by Memory Usage (Ascending)

Data Type Quantization Quantize TE Fuse QKV Memory (GB) Latency (s)
bf16 INT4 Yes No 3.066 7.493
bf16 INT4 Yes Yes 3.176 7.512
fp16 INT8 Yes No 5.363 1.526
bf16 INT8 Yes No 5.364 1.429
fp16 FP8 Yes No 5.364 1.587
bf16 FP8 Yes No 5.365 1.475
bf16 FP8 Yes Yes 5.537 1.450
bf16 INT8 Yes Yes 5.537 1.403
fp16 INT8 Yes Yes 5.537 1.471
fp16 FP8 Yes Yes 5.537 1.518
bf16 INT4 No No 9.380 7.320
bf16 INT4 No Yes 9.491 7.340
bf16 FP8 No No 9.672 1.434
bf16 INT8 No No 9.672 1.400
bf16 INT8 No Yes 9.847 1.375
bf16 FP8 No Yes 9.847 1.397
bf16 NONE Yes No 10.214 1.151
bf16 NONE No No 10.214 1.152
bf16 NONE No Yes 10.560 1.142
bf16 NONE Yes Yes 10.560 1.142
fp16 INT8 No No 11.547 1.494
fp16 FP8 No No 11.547 1.520
fp16 INT8 No Yes 11.722 1.439
fp16 FP8 No Yes 11.722 1.472
fp16 NONE Yes No 12.086 1.181
fp16 NONE No No 12.086 1.182
fp16 NONE No Yes 12.433 1.172
fp16 NONE Yes Yes 12.435 1.170

Sorted by Latency (Ascending)

Data Type Quantization Quantize TE Fuse QKV Memory (GB) Latency (s)
bf16 NONE No Yes 10.560 1.142
bf16 NONE Yes Yes 10.560 1.142
bf16 NONE Yes No 10.214 1.151
bf16 NONE No No 10.214 1.152
fp16 NONE Yes Yes 12.435 1.170
fp16 NONE No Yes 12.433 1.172
fp16 NONE Yes No 12.086 1.181
fp16 NONE No No 12.086 1.182
bf16 INT8 No Yes 9.847 1.375
bf16 FP8 No Yes 9.847 1.397
bf16 INT8 No No 9.672 1.400
bf16 INT8 Yes Yes 5.537 1.403
bf16 INT8 Yes No 5.364 1.429
bf16 FP8 No No 9.672 1.434
fp16 INT8 No Yes 11.722 1.439
bf16 FP8 Yes Yes 5.537 1.450
fp16 INT8 Yes Yes 5.537 1.471
fp16 FP8 No Yes 11.722 1.472
bf16 FP8 Yes No 5.365 1.475
fp16 INT8 No No 11.547 1.494
fp16 FP8 Yes Yes 5.537 1.518
fp16 FP8 No No 11.547 1.520
fp16 INT8 Yes No 5.363 1.526
fp16 FP8 Yes No 5.364 1.587
bf16 INT4 No No 9.380 7.320
bf16 INT4 No Yes 9.491 7.340
bf16 INT4 Yes No 3.066 7.493
bf16 INT4 Yes Yes 3.176 7.512

Flux-dev - Batch Size: 1

dtype: bf16, qtype: int4, qte: 0, fuse: 0
dtype: bf16, qtype: int4, qte: 0, fuse: 0
dtype: bf16, qtype: none, qte: 0, fuse: 0
dtype: bf16, qtype: none, qte: 0, fuse: 0
dtype: bf16, qtype: fp8, qte: 0, fuse: 0
dtype: bf16, qtype: fp8, qte: 0, fuse: 0
dtype: fp16, qtype: none, qte: 0, fuse: 0
dtype: fp16, qtype: none, qte: 0, fuse: 0
dtype: fp16, qtype: fp8, qte: 0, fuse: 0
dtype: fp16, qtype: fp8, qte: 0, fuse: 0
dtype: bf16, qtype: int8, qte: 1, fuse: 0
dtype: bf16, qtype: int8, qte: 1, fuse: 0
dtype: fp16, qtype: int8, qte: 1, fuse: 0
dtype: fp16, qtype: int8, qte: 1, fuse: 0
dtype: bf16, qtype: fp8, qte: 1, fuse: 0
dtype: bf16, qtype: fp8, qte: 1, fuse: 0
dtype: fp16, qtype: none, qte: 1, fuse: 0
dtype: fp16, qtype: none, qte: 1, fuse: 0
dtype: bf16, qtype: int4, qte: 1, fuse: 0
dtype: bf16, qtype: int4, qte: 1, fuse: 0
dtype: bf16, qtype: none, qte: 1, fuse: 0
dtype: bf16, qtype: none, qte: 1, fuse: 0
dtype: fp16, qtype: int8, qte: 0, fuse: 0
dtype: fp16, qtype: int8, qte: 0, fuse: 0
dtype: fp16, qtype: fp8, qte: 1, fuse: 0
dtype: fp16, qtype: fp8, qte: 1, fuse: 0
dtype: bf16, qtype: int8, qte: 0, fuse: 0
dtype: bf16, qtype: int8, qte: 0, fuse: 0

Sorted by Memory Usage (Ascending)

Data Type Quantization Quantize TE Fuse QKV Memory (GB) Latency (s)
bf16 INT4 Yes No 8.795 62.213
bf16 INT4 No No 15.234 62.075
bf16 FP8 Yes No 15.997 8.994
fp16 FP8 Yes No 15.998 9.067
fp16 INT8 Yes No 15.999 8.435
bf16 INT8 Yes No 15.999 8.420
bf16 FP8 No No 20.393 8.963
bf16 INT8 No No 20.395 8.372
fp16 INT8 No No 22.270 8.400
fp16 FP8 No No 22.271 9.013
bf16 NONE Yes No 31.470 6.560
bf16 NONE No No 31.470 6.569
fp16 NONE Yes No 33.345 6.672
fp16 NONE No No 33.345 6.657

Sorted by Latency (Ascending)

Data Type Quantization Quantize TE Fuse QKV Memory (GB) Latency (s)
bf16 NONE Yes No 31.470 6.560
bf16 NONE No No 31.470 6.569
fp16 NONE No No 33.345 6.657
fp16 NONE Yes No 33.345 6.672
bf16 INT8 No No 20.395 8.372
fp16 INT8 No No 22.270 8.400
bf16 INT8 Yes No 15.999 8.420
fp16 INT8 Yes No 15.999 8.435
bf16 FP8 No No 20.393 8.963
bf16 FP8 Yes No 15.997 8.994
fp16 FP8 No No 22.271 9.013
fp16 FP8 Yes No 15.998 9.067
bf16 INT4 No No 15.234 62.075
bf16 INT4 Yes No 8.795 62.213

Sd3 - Batch Size: 1

dtype: fp16, qtype: none, qte: 1, fuse: 0
dtype: fp16, qtype: none, qte: 1, fuse: 0
dtype: fp16, qtype: none, qte: 1, fuse: 1
dtype: fp16, qtype: none, qte: 1, fuse: 1
dtype: bf16, qtype: fp8, qte: 0, fuse: 0
dtype: bf16, qtype: fp8, qte: 0, fuse: 0
dtype: bf16, qtype: fp8, qte: 0, fuse: 1
dtype: bf16, qtype: fp8, qte: 0, fuse: 1
dtype: bf16, qtype: none, qte: 1, fuse: 0
dtype: bf16, qtype: none, qte: 1, fuse: 0
dtype: bf16, qtype: none, qte: 1, fuse: 1
dtype: bf16, qtype: none, qte: 1, fuse: 1
dtype: fp16, qtype: int8, qte: 0, fuse: 0
dtype: fp16, qtype: int8, qte: 0, fuse: 0
dtype: fp16, qtype: fp8, qte: 0, fuse: 0
dtype: fp16, qtype: fp8, qte: 0, fuse: 0
dtype: fp16, qtype: fp8, qte: 0, fuse: 1
dtype: fp16, qtype: fp8, qte: 0, fuse: 1
dtype: fp16, qtype: int8, qte: 0, fuse: 1
dtype: fp16, qtype: int8, qte: 0, fuse: 1
dtype: bf16, qtype: int8, qte: 0, fuse: 0
dtype: bf16, qtype: int8, qte: 0, fuse: 0
dtype: bf16, qtype: int8, qte: 0, fuse: 1
dtype: bf16, qtype: int8, qte: 0, fuse: 1
dtype: bf16, qtype: fp8, qte: 1, fuse: 0
dtype: bf16, qtype: fp8, qte: 1, fuse: 0
dtype: bf16, qtype: fp8, qte: 1, fuse: 1
dtype: bf16, qtype: fp8, qte: 1, fuse: 1
dtype: bf16, qtype: none, qte: 0, fuse: 0
dtype: bf16, qtype: none, qte: 0, fuse: 0
dtype: bf16, qtype: none, qte: 0, fuse: 1
dtype: bf16, qtype: none, qte: 0, fuse: 1
dtype: fp16, qtype: none, qte: 0, fuse: 0
dtype: fp16, qtype: none, qte: 0, fuse: 0
dtype: fp16, qtype: none, qte: 0, fuse: 1
dtype: fp16, qtype: none, qte: 0, fuse: 1
dtype: bf16, qtype: int8, qte: 1, fuse: 0
dtype: bf16, qtype: int8, qte: 1, fuse: 0
dtype: bf16, qtype: int8, qte: 1, fuse: 1
dtype: bf16, qtype: int8, qte: 1, fuse: 1
dtype: fp16, qtype: int8, qte: 1, fuse: 0
dtype: fp16, qtype: int8, qte: 1, fuse: 0
dtype: fp16, qtype: fp8, qte: 1, fuse: 0
dtype: fp16, qtype: fp8, qte: 1, fuse: 0
dtype: fp16, qtype: fp8, qte: 1, fuse: 1
dtype: fp16, qtype: fp8, qte: 1, fuse: 1
dtype: fp16, qtype: int8, qte: 1, fuse: 1
dtype: fp16, qtype: int8, qte: 1, fuse: 1

Sorted by Memory Usage (Ascending)

Data Type Quantization Quantize TE Fuse QKV Memory (GB) Latency (s)
fp16 INT8 Yes No 7.625 2.553
bf16 FP8 Yes No 7.628 2.659
bf16 INT8 Yes No 7.628 2.531
fp16 FP8 Yes No 7.630 2.682
bf16 INT8 Yes Yes 7.963 2.527
fp16 FP8 Yes Yes 7.963 2.628
fp16 INT8 Yes Yes 7.963 2.517
bf16 FP8 Yes Yes 7.968 2.637
bf16 FP8 No No 12.598 2.598
bf16 INT8 No No 12.598 2.475
bf16 FP8 No Yes 12.926 2.574
bf16 INT8 No Yes 12.929 2.458
fp16 FP8 No No 14.469 2.630
fp16 INT8 No No 14.473 2.499
bf16 NONE No No 14.526 1.998
bf16 NONE Yes No 14.526 2.000
fp16 INT8 No Yes 14.804 2.464
fp16 FP8 No Yes 14.838 2.571
bf16 NONE Yes Yes 15.183 2.003
bf16 NONE No Yes 15.183 2.000
fp16 NONE No No 16.397 2.046
fp16 NONE Yes No 16.403 2.053
fp16 NONE Yes Yes 17.058 2.054
fp16 NONE No Yes 17.058 2.051

Sorted by Latency (Ascending)

Data Type Quantization Quantize TE Fuse QKV Memory (GB) Latency (s)
bf16 NONE No No 14.526 1.998
bf16 NONE No Yes 15.183 2.000
bf16 NONE Yes No 14.526 2.000
bf16 NONE Yes Yes 15.183 2.003
fp16 NONE No No 16.397 2.046
fp16 NONE No Yes 17.058 2.051
fp16 NONE Yes No 16.403 2.053
fp16 NONE Yes Yes 17.058 2.054
bf16 INT8 No Yes 12.929 2.458
fp16 INT8 No Yes 14.804 2.464
bf16 INT8 No No 12.598 2.475
fp16 INT8 No No 14.473 2.499
fp16 INT8 Yes Yes 7.963 2.517
bf16 INT8 Yes Yes 7.963 2.527
bf16 INT8 Yes No 7.628 2.531
fp16 INT8 Yes No 7.625 2.553
fp16 FP8 No Yes 14.838 2.571
bf16 FP8 No Yes 12.926 2.574
bf16 FP8 No No 12.598 2.598
fp16 FP8 Yes Yes 7.963 2.628
fp16 FP8 No No 14.469 2.630
bf16 FP8 Yes Yes 7.968 2.637
bf16 FP8 Yes No 7.628 2.659
fp16 FP8 Yes No 7.630 2.682