Patch 6.9.2 Notes

In between major updates I'll simply paste below the major update notes so it's more convenient, and then include specific notes for minor updates.

Added MiniCPM3 - 4b chat model
- Very very good at at single factoid retrieval, even from many contexts, but DO NOT use when asking to retrieve multiple factoids from the contexts because it will ramble.
Robust validation of settings entered.
Use qthread with metrics bar for smoother GUI operation
Add page numbers when contexts that originate from a .pdf are returned.
Return relevant scores for all citations, which helps users determine which similarity setting to use.

Patch 6.9.1 Notes

Added Qwen 2.5 - 32b chat model.
Add sparkgraphs for metrics and the ability to right-click on the metrics bar and select a different visualization.

Exciting new "Ask Jeeves" helper who answers questions about how to use the program. Simply click "Jeeves" in the upper left.
"Jeeves" gets his knowledge from a vector database that comes shipped with this release! NO MORE USER GUIDE TAB - just ASK JEEVES!
- IMPORTANT: After running setup_windows.py you must go into the Assets folder, right-click on koboldcpp_nocuda.exe, and check the "Unblock" checkbox first! If it's not there, try starting Jeeves and see if it works. Create a Github Issue if it doesn't work because Ask Jeeves is a new feature.
- IMPORTANT: You may also need to disable or make an exception for any firewall you have. Submit a Github Issue if you encounter any problems.

In the Tools Tab, simply select a python library, click Scrape, and all the .html files will be downloaded to the Scraped_Documentation folder.
Create a vector database out of all of the .html files for a given library, then use one of the coding specific models to answer questions!

You can now enter an "access token" and access models that are "gated" on huggingface. Currently, llama 3.2 - 3b and mistral-small - 22b are the only gated models.
Ask Jeeves how to get a huggingface access token.

The vector models are now downloaded using the snapshot_download functionality from huggingface_hub, which can exclude unnecessary files such as onnx, .bin (when an equivalent .safetensors version is available), and others. This significantly reduces the amount of data that this program downloads and therefore increases speed and usability.
This speedup should pertain to vector, chat, and whisper models, and implementing the snapshot_download for TTS models is planned.
New Compare GPUs button in the Tools Tab, which displays metrics for various GPUs so you can better determine your settings. Charts and graphs for chat/vision models will be added in the near future.
New metrics bar with speedometer-looking widgets.
Removed the User Guide Tab altogether to free up space. You can now simply Ask Jeeves instead.
Lots and lots of refactoring to improve various things...

Added Qwen 2.5 - 1.5b, Llama 3.2 - 3b, Internlm 2.5 - 1.8b, Dolphin-Llama 3.1 - 8b, Mistral-Small - 22b.
Removed Longwriter Llama 3.1 - 8b, Longwriter GLM4 - 9b, Yi - 9b, Solar Pro Preview - 22.1b.

Removed Llava 1.5, Bakllava, Falcon-vlm - 11b, and Phi-3-Vision models as either under-performing or eclipsed by pre-existing models that have additional benefits.

Add Kobold as a backend in addition to LM Studio and Local Models, at which point I'll probably have to rename this github repo.
Add OpenAI backend.
Remove LM Studio Server settings and revise instructions since LM Studio has changed significantly since they were last done.

Full Changelog: v6.8.2...v6.9.0