v6.9.2 - Welcome Kobold!
Patch 6.9.2 Notes
In between major updates I'll simply paste below the major update notes so it's more convenient, and then include specific notes for minor updates.
- Added
MiniCPM3 - 4b
chat model- Very very good at at single factoid retrieval, even from many contexts, but DO NOT use when asking to retrieve multiple factoids from the contexts because it will ramble.
- Robust validation of settings entered.
- Use
qthread
with metrics bar for smoother GUI operation - Add page numbers when contexts that originate from a
.pdf
are returned. - Return relevant scores for all citations, which helps users determine which
similarity
setting to use.
Patch 6.9.1 Notes
- Added
Qwen 2.5 - 32b
chat model. - Add sparkgraphs for metrics and the ability to right-click on the metrics bar and select a different visualization.
Welcome Kobold edition v6.9.0
Ask Jeeves!
- Exciting new "Ask Jeeves" helper who answers questions about how to use the program. Simply click "Jeeves" in the upper left.
- "Jeeves" gets his knowledge from a vector database that comes shipped with this release! NO MORE USER GUIDE TAB - just ASK JEEVES!
- IMPORTANT: After running
setup_windows.py
you must go into theAssets
folder, right-click onkoboldcpp_nocuda.exe
, and check the "Unblock" checkbox first! If it's not there, try starting Jeeves and see if it works. Create a Github Issue if it doesn't work because Ask Jeeves is a new feature. - IMPORTANT: You may also need to disable or make an exception for any firewall you have. Submit a Github
Issue
if you encounter any problems.
- IMPORTANT: After running
Scrape Python Library Documentation
- In the Tools Tab, simply select a python library, click
Scrape
, and all the.html
files will be downloaded to theScraped_Documentation
folder. - Create a vector database out of all of the
.html
files for a given library, then use one of the coding specific models to answer questions!
Huggingface Access Token
- You can now enter an "access token" and access models that are "gated" on huggingface. Currently,
llama 3.2 - 3b
andmistral-small - 22b
are the only gated models. - Ask Jeeves how to get a huggingface access token.
Other Improvements
- The vector models are now downloaded using the
snapshot_download
functionality fromhuggingface_hub
, which can exclude unnecessary files such asonnx
,.bin
(when an equivalent.safetensors
version is available), and others. This significantly reduces the amount of data that this program downloads and therefore increases speed and usability. - This speedup should pertain to vector, chat, and whisper models, and implementing the
snapshot_download
for TTS models is planned. - New
Compare GPUs
button in the Tools Tab, which displays metrics for various GPUs so you can better determine your settings. Charts and graphs for chat/vision models will be added in the near future. - New metrics bar with speedometer-looking widgets.
- Removed the User Guide Tab altogether to free up space. You can now simply
Ask Jeeves
instead. - Lots and lots of refactoring to improve various things...
Added/Removed Chat Models
- Added
Qwen 2.5 - 1.5b
,Llama 3.2 - 3b
,Internlm 2.5 - 1.8b
,Dolphin-Llama 3.1 - 8b
,Mistral-Small - 22b
. - Removed
Longwriter Llama 3.1 - 8b
,Longwriter GLM4 - 9b
,Yi - 9b
,Solar Pro Preview - 22.1b
.
Added/Removed Vision Models
- Removed
Llava 1.5
,Bakllava
,Falcon-vlm - 11b
, andPhi-3-Vision
models as either under-performing or eclipsed by pre-existing models that have additional benefits.
Roadmap
- Add
Kobold
as a backend in addition toLM Studio
andLocal Models
, at which point I'll probably have to rename this github repo. - Add
OpenAI
backend. - Remove LM Studio Server settings and revise instructions since LM Studio has changed significantly since they were last done.
Full Changelog: v6.8.2...v6.9.0