-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* feat: support phi3.5 moe model loading * fix: prefer llama base model and improve rotary logic * feat: return reasonable generation and add integration test * fix: run lint and update docs * fix: rerun lint for openapi docs * fix: prefer do_sample false unless temp is set by user, and update chat tests * fix: small typo adjustments * fix: consolidate long rope paths * fix: revert greedy by default and test changes * Vendor configuration so that we don't have to `trust_remote_code` * Use SparseMoELayer * Add support for dense MoE * Some type annotations * Add the usual model tests * Ruff. --------- Co-authored-by: Daniël de Kok <[email protected]> Co-authored-by: Nicolas Patry <[email protected]>
- Loading branch information
1 parent
90a1d04
commit 93a7042
Showing
11 changed files
with
1,164 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
109 changes: 109 additions & 0 deletions
109
integration-tests/models/__snapshots__/test_flash_phi35_moe/test_flash_phi35_moe.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
{ | ||
"details": { | ||
"best_of_sequences": null, | ||
"finish_reason": "length", | ||
"generated_tokens": 10, | ||
"prefill": [ | ||
{ | ||
"id": 1724, | ||
"logprob": null, | ||
"text": "What" | ||
}, | ||
{ | ||
"id": 338, | ||
"logprob": -0.7133789, | ||
"text": "is" | ||
}, | ||
{ | ||
"id": 16030, | ||
"logprob": -13.9296875, | ||
"text": "gradient" | ||
}, | ||
{ | ||
"id": 26815, | ||
"logprob": -0.048919678, | ||
"text": "descent" | ||
}, | ||
{ | ||
"id": 29973, | ||
"logprob": -3.0078125, | ||
"text": "?" | ||
}, | ||
{ | ||
"id": 13, | ||
"logprob": -2.8105469, | ||
"text": "\n" | ||
}, | ||
{ | ||
"id": 13, | ||
"logprob": -0.84521484, | ||
"text": "\n" | ||
} | ||
], | ||
"seed": null, | ||
"tokens": [ | ||
{ | ||
"id": 25584, | ||
"logprob": -0.017028809, | ||
"special": false, | ||
"text": "Grad" | ||
}, | ||
{ | ||
"id": 993, | ||
"logprob": -0.0027313232, | ||
"special": false, | ||
"text": "ient" | ||
}, | ||
{ | ||
"id": 26815, | ||
"logprob": -0.023254395, | ||
"special": false, | ||
"text": " descent" | ||
}, | ||
{ | ||
"id": 338, | ||
"logprob": -2.0623207e-05, | ||
"special": false, | ||
"text": " is" | ||
}, | ||
{ | ||
"id": 263, | ||
"logprob": -0.5361328, | ||
"special": false, | ||
"text": " a" | ||
}, | ||
{ | ||
"id": 937, | ||
"logprob": -0.17578125, | ||
"special": false, | ||
"text": " first" | ||
}, | ||
{ | ||
"id": 29899, | ||
"logprob": 0.0, | ||
"special": false, | ||
"text": "-" | ||
}, | ||
{ | ||
"id": 2098, | ||
"logprob": -0.00011539459, | ||
"special": false, | ||
"text": "order" | ||
}, | ||
{ | ||
"id": 13883, | ||
"logprob": -0.47436523, | ||
"special": false, | ||
"text": " optimization" | ||
}, | ||
{ | ||
"id": 5687, | ||
"logprob": -0.00027680397, | ||
"special": false, | ||
"text": " algorithm" | ||
} | ||
], | ||
"top_tokens": null | ||
}, | ||
"generated_text": "Gradient descent is a first-order optimization algorithm" | ||
} |
99 changes: 99 additions & 0 deletions
99
...tion-tests/models/__snapshots__/test_flash_phi35_moe/test_flash_phi35_moe_all_params.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
{ | ||
"details": { | ||
"best_of_sequences": null, | ||
"finish_reason": "length", | ||
"generated_tokens": 10, | ||
"prefill": [ | ||
{ | ||
"id": 16030, | ||
"logprob": null, | ||
"text": "gradient" | ||
}, | ||
{ | ||
"id": 26815, | ||
"logprob": -6.4960938, | ||
"text": "descent" | ||
}, | ||
{ | ||
"id": 29973, | ||
"logprob": -5.1484375, | ||
"text": "?" | ||
}, | ||
{ | ||
"id": 13, | ||
"logprob": -4.0351562, | ||
"text": "\n" | ||
}, | ||
{ | ||
"id": 13, | ||
"logprob": -5.2265625, | ||
"text": "\n" | ||
} | ||
], | ||
"seed": 0, | ||
"tokens": [ | ||
{ | ||
"id": 10994, | ||
"logprob": -1.1542969, | ||
"special": false, | ||
"text": "Hello" | ||
}, | ||
{ | ||
"id": 29991, | ||
"logprob": 0.0, | ||
"special": false, | ||
"text": "!" | ||
}, | ||
{ | ||
"id": 739, | ||
"logprob": 0.0, | ||
"special": false, | ||
"text": " It" | ||
}, | ||
{ | ||
"id": 2444, | ||
"logprob": -0.42260742, | ||
"special": false, | ||
"text": " seems" | ||
}, | ||
{ | ||
"id": 366, | ||
"logprob": 0.0, | ||
"special": false, | ||
"text": " you" | ||
}, | ||
{ | ||
"id": 29915, | ||
"logprob": 0.0, | ||
"special": false, | ||
"text": "'" | ||
}, | ||
{ | ||
"id": 276, | ||
"logprob": -0.9838867, | ||
"special": false, | ||
"text": "re" | ||
}, | ||
{ | ||
"id": 3211, | ||
"logprob": 0.0, | ||
"special": false, | ||
"text": " address" | ||
}, | ||
{ | ||
"id": 292, | ||
"logprob": 0.0, | ||
"special": false, | ||
"text": "ing" | ||
}, | ||
{ | ||
"id": 263, | ||
"logprob": -0.15124512, | ||
"special": false, | ||
"text": " a" | ||
} | ||
], | ||
"top_tokens": null | ||
}, | ||
"generated_text": "What is gradient descent?\n\nHello! It seems you're addressing a" | ||
} |
Oops, something went wrong.