Releases · furiosa-ai/inference-compression

05 Jul 14:38

BeomGeunCho

MLPerf4.1-v3.12.1

42cd3ce

MLPerf4.1-v3.12.1 Latest

Latest

What's Changed

Fix pg dataloader by @sunghyuckhong in #74
Generalize ci by @sunghyuckhong in #67
gptj qparam immigration code by @jeongin-yun in #78
use mcp generator by @jeongin-yun in #82
Save qlv4 by @Mincho0102 in #80

Full Changelog: MLPerf4.1-v3.12...MLPerf4.1-v3.12.1

Contributors

jeongin-yun, Mincho0102, and sunghyuckhong

Assets 2

05 Jul 14:38

BeomGeunCho

MLPerf4.1-llama-v3.12.1

6b8b642

MLPerf4.1-llama-v3.12.1

What's Changed

split evaluation by @BeomGeunCho in #72
Qlv4 save by @Mincho0102 in #77
add args and statedict by @Mincho0102 in #81
Port mcp generator & generalize ci for llama by @sunghyuckhong in #83

Full Changelog: MLPerf4.1-llama-v3.11...MLPerf4.1-llama-v3.12.1

Contributors

BeomGeunCho, Mincho0102, and sunghyuckhong

Assets 2

02 Jul 12:49

BeomGeunCho

MLPerf4.1-v3.12

3f6a1c9

MLPerf4.1-v3.12

What's Changed

Handle bert generator updates by @jh619lee in #70
int8xint8 lm_head dtype for gptj by @BeomGeunCho in #75

Full Changelog: MLPerf4.1-v3.11...MLPerf4.1-v3.12

Contributors

BeomGeunCho and jh619lee

Assets 2

01 Jul 04:21

BeomGeunCho

MLPerf4.1-llama-v3.11

c12fac0

MLPerf4.1-llama-v3.11

사용법 참고
https://www.notion.so/furiosa/Installation-evaluation-7ea6a7dfd17c424a8afff82958972f24

Assets 2

27 Jun 06:36

BeomGeunCho

MLPerf4.1-v3.11

b7df4c5

MLPerf4.1-v3.11

What's Changed

add new models for gptj by @sunghyuckhong in #46
BERT furiosa-llm-models helper 로 tracing and model_source 변경 huggingface_rngd_gelu, mlperf_submission by @BeomGeunCho in #47
Custom dataset for paged attention by @sunghyuckhong in #48
Add bert ci test by @jh619lee in #49
GPT-J CI by @sunghyuckhong in #52
pad with pad token by @jeongin-yun in #55
Move bert generator init by @jh619lee in #56
add compact causal mask model. by @BeomGeunCho in #58
Fix gptj ci by @sunghyuckhong in #57
Add ci test for causal compact mask bert by @jh619lee in #60
apply changed get_quant_model to main.py by @BeomGeunCho in #63
merge splited accuracy log files by @BeomGeunCho in #64
Model scripts 추가 by @BeomGeunCho in #65

New Contributors

@jh619lee made their first contribution in #49
@jeongin-yun made their first contribution in #55

Full Changelog: MLPerf4.1-v3.8...MLPerf4.1-v3.11

Contributors

jeongin-yun, BeomGeunCho, and 2 other contributors

Assets 2

07 Jun 09:37

BeomGeunCho

MLPerf4.1-v3.8

a2cf147

MLPerf4.1-v3.8

What's Changed

remove model name for gptj in inference by @sunghyuckhong in #42
remove calib argument by @sunghyuckhong in #43
Calibration with padded inputs by @BeomGeunCho in #44
add erf gelu models by @BeomGeunCho in #45

Full Changelog: MLPerf4.1-v3.5...MLPerf4.1-v3.8

Contributors

BeomGeunCho and sunghyuckhong

Assets 2

08 May 08:01

BeomGeunCho

MLPerf4.1-v3.5

30c6fc9

MLPerf4.1-v3.5

What's Changed

MCP 변경사항 반영: decode_graph create_quant_sim 동작 시 quantized_prefill_graph 입력하도록 수정함

Assets 2

29 Apr 08:28

sunghyuckhong

MLPerf4.1-v3.4

30f99cc

MLPerf4.1-v3.4

Ported QuantPagedAttentionGenerator for paged_attention_rope
It is now possible to generate with paged_attention_rope.GPTJForCausalLM by setting the argument 'model_source' as 'paged_attention_rope' for language/gpt-j/main.py

Assets 2

09 Apr 09:03

BeomGeunCho

MLPerf4.1-v3.1

e6953ca

MLPerf4.1-v3.1

What's Changed

Generation with pagedattention_concat_rope by @sunghyuckhong in #33
Generate with preallocated_rope by @sunghyuckhong in #34

GPT-J preallocated (preallocated_concat_rope.py) 추가 및 paged_attention_concat_rope 업데이트 반영
- paged_attention_concat_rope.py
  - Qlevel 4 fx graph 추출 및 generation (greedy_search) 까지 확인
  - furiosa-llm-repo에 QuantPagedAttentionGenerator 정의
    - furiosa-llm.PagedAttentionGeneator를 torch model이 아닌 fx graph로 수행시키기 위한 구현
- preallocated_concat_rope.py
  - Qlevel4 fx graph 변환 완료
  - QuanPreallocatedGenerator 구현 완료
    - QuantPagedAttentionGenerator와 동일한 역할을 수행함
- 추후 작업: 수정된 rope 로 generate 하는 기능
Model-compressor-private: torch.dynamo.export 로 graph break 가 발생하던 QLV4EmbeddingMOD 수정됨 (성환님이 확인/반영해주신 issue)
Furiosa-llm-models: dtype cast 를 지워서 f32에서 i32로 변환하는 부분이 FxToFxp가 아니라 Cast 로 들어오고 있던 상황 해결 (issue slack)

Full Changelog: MLPerf4.1-v2.1...MLPerf4.1-v3.1

Contributors

sunghyuckhong

Assets 2

02 Apr 17:40

sunghyuckhong

MLPerf4.1-v2.1

95b2667

MLPerf4.1-v2.1

Performance
- BERT: {"exact_match": 83.9546 (100.32%), "f1": 91.05177 (100.19%)} [W8A8KV8, calibrated and evaluated on A100]
- GPT-J original: rouge1: 43.0717(100.20%) rouge2:20.1398(100.08%) rougeL:30.0108(100.08%), 'gen_len': 3984079(99.18%) [W8A8KV8 + Smoothquant, calibrated and evaluated on H100]
Updates
- Changed zero-point shapes from per-head to per-tensor, affecting the matmul operation with zero-point equalizing
- Added emulation in, emulation out to FP32 for bf16 x bf16 dot product.
- Ported GPTJ-paged_attention_concat_rope model

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

What's Changed

Contributors

Releases: furiosa-ai/inference-compression

MLPerf4.1-v3.12.1

What's Changed

Contributors

MLPerf4.1-llama-v3.12.1

What's Changed

Contributors

MLPerf4.1-v3.12

What's Changed

Contributors

MLPerf4.1-llama-v3.11

MLPerf4.1-v3.11

What's Changed

New Contributors

Contributors

MLPerf4.1-v3.8

What's Changed

Contributors

MLPerf4.1-v3.5

What's Changed

MLPerf4.1-v3.4

MLPerf4.1-v3.1

What's Changed

Contributors

MLPerf4.1-v2.1