Skip to content

v6.7.0 - LONG CONTEXT no see!

Compare
Choose a tag to compare
@BBC-Esq BBC-Esq released this 22 Aug 22:30
· 253 commits to main since this release
38c5baf

General Updates

  • CITATIONS! with hyperlinks when searching the Vector DB and getting a response.

image

  • Display of a chat model's max context and how many tokens you've used.

image

2X Speed Increase

Choose "half" in the database creation settings. It will automatically choose bfloat16 or float16 based on your GPU.

This results in a 2x speed increase with extremely low loss in quality.

Chat Models

Removed Internlm2_5 - 1.8b and Qwen 1.5 - 1.6b as under performing.
Removed Dolphin-Llama 3 - 8b and Internlm2 - 20b as superseded.
Added Danube 3 - 4b with 8k context.
Added Phi 3.5 Mini - 4b with 8k context.
Added Hermes-4-Llama 3.1 - 8b with 8k context
Added Internlm2_5 - 20b with 8k context

The following models now have have 8192 context:

Model Name Parameters (billion) Context Length
Danube 3 - 4b 4 8192
Dolphin-Qwen 2 - 1.5b 1.5 8192
Phi 3.5 Mini - 4b 4 8192
Internlm2_5 - 7b 7 8192
Dolphin-Llama 3.1 - 8b 8 8192
Hermes-3-Llama-3.1 - 8b 8 8192
Dolphin-Qwen 2 - 7b 7 8192
Dolphin-Mistral-Nemo - 12b 12 8192
Internlm2_5 - 20b 20 8192

Text to Speech Models

  • Excited to add additional models to choose from when using whisperspeech as the text to speech backend - see the chart below for the various s2a and t2s model combinations and "relative" compute times along with real vram usage stats.
chart_tts

Current Chat and Vision Models

chart_chat chart_vision