v6.7.0 - LONG CONTEXT no see!
General Updates
- CITATIONS! with hyperlinks when searching the Vector DB and getting a response.
- Display of a chat model's max context and how many tokens you've used.
2X Speed Increase
Choose "half" in the database creation settings. It will automatically choose bfloat16
or float16
based on your GPU.
This results in a 2x speed increase with extremely low loss in quality.
Chat Models
Removed Internlm2_5 - 1.8b
and Qwen 1.5 - 1.6b
as under performing.
Removed Dolphin-Llama 3 - 8b
and Internlm2 - 20b
as superseded.
Added Danube 3 - 4b
with 8k context.
Added Phi 3.5 Mini - 4b
with 8k context.
Added Hermes-4-Llama 3.1 - 8b
with 8k context
Added Internlm2_5 - 20b
with 8k context
The following models now have have 8192 context:
Model Name | Parameters (billion) | Context Length |
---|---|---|
Danube 3 - 4b | 4 | 8192 |
Dolphin-Qwen 2 - 1.5b | 1.5 | 8192 |
Phi 3.5 Mini - 4b | 4 | 8192 |
Internlm2_5 - 7b | 7 | 8192 |
Dolphin-Llama 3.1 - 8b | 8 | 8192 |
Hermes-3-Llama-3.1 - 8b | 8 | 8192 |
Dolphin-Qwen 2 - 7b | 7 | 8192 |
Dolphin-Mistral-Nemo - 12b | 12 | 8192 |
Internlm2_5 - 20b | 20 | 8192 |
Text to Speech Models
- Excited to add additional models to choose from when using
whisperspeech
as the text to speech backend - see the chart below for the variouss2a
andt2s
model combinations and "relative" compute times along with real vram usage stats.