Ollama Grid Search: Instantly Evaluate Multiple LLMs and Prompts.

This project automates the process of selecting the best models, prompts, or inference parameters for a given use-case, allowing you to iterate over their combinations and to visually inspect the results.

It assumes Ollama is installed and serving endpoints, either in localhost or in a remote server.

Here's what an experiment for a simple prompt, tested on 3 different models, looks like:

(For a more in-depth look at an evaluation process assisted by this tool, please check https://dezoito.github.io/2023/12/27/rust-ollama-grid-search.html).

Installation

Check the releases page for the project, or on the sidebar.

Features

Automatically fetches models from local or remote Ollama servers;
Iterates over multiple different models, prompts and parameters to generate inferences;
A/B test different prompts on several models simultaneously;
Allows multiple iterations for each combination of parameters;
Allows limited concurrency or synchronous inference calls (to prevent spamming servers);
Optionally outputs inference parameters and response metadata (inference time, tokens and tokens/s);
Refetching of individual inference calls;
Model selection can be filtered by name;
List experiments which can be downloaded in JSON format;
Experiments can be inspected in readable views;
Re-run past experiments, cloning or modifying the parameters used in the past;
Configurable inference timeout;
Custom default parameters and system prompts can be defined in settings
Fully functional prompt database with examples;
Prompts can be selected and "autocompleted" by typing "/" in the inputs

Grid Search (or something similar...)

Technically, the term "grid search" refers to iterating over a series of different model hyperparams to optimize model performance, but that usually means parameters like batch_size, learning_rate, or number_of_epochs, more commonly used in training.

But the concept here is similar:

Lets define a selection of models, a prompt and some parameter combinations:

The prompt will be submitted once for each parameter value, for each one of the selected models, generating a set of responses.

A/B Testing

Similarly, you can perform A/B tests by selecting different models and compare results for the same prompt/parameter combination, or test different prompts under similar configurations:

Comparing the results of different prompts for the same model

Prompt Archive

You can save and manage your prompts (we want to make prompts compatible with Open WebUI)

You can autocomplete prompts by typing "/" (inspired by Open WebUI, as well):

Experiment Logs

You can list, inspect, or download your experiments:

Future Features

Grading results and filtering by grade
Importing, exporting and sharing prompt lists and experiment files.

Contributing

For obvious bugs and spelling mistakes, please go ahead and submit a PR.
If you want to propose a new feature, change existing functionality, or propose anything more complex, please open an issue for discussion, before getting work done on a PR.

Development

Make sure you have Rust installed.
Clone the repository (or a fork)

git clone https://github.com/dezoito/ollama-grid-search.git
cd ollama-grid-search

Install the frontend dependencies.

cd <project root>
# I'm using bun to manage dependencies,
# but feel free to use yarn or npm
bun install

Make sure rust-analyzer is configured to run Clippy when checking code.

If you are running VS Code, add this to your settings.json file
```
{
   ...
   "rust-analyzer.check.command": "clippy",
}
```
(or, better yet, just use the settings file provided with the code)
Run the app in development mode
```
cd <project root>/
bun tauri dev
```
Go grab a cup of coffee because this may take a while.

Citations

The following works and theses have cited this repository:

Inouye, D & Lindo, L, & Lee, R & Allen, E; Computer Science and Engineering Senior Theses: Applied Auto-tuning on LoRA Hyperparameters Santa Clara University, 2024 https://scholarcommons.scu.edu/cgi/viewcontent.cgi?article=1271&context=cseng_senior

Thank you!

Huge thanks to @FabianLars, @peperroni21 and @TomReidNZ.

Name		Name	Last commit message	Last commit date
Latest commit History 520 Commits
.github/workflows		.github/workflows
.vscode		.vscode
nix		nix
old		old
public		public
screenshots		screenshots
src-tauri		src-tauri
src		src
styles		styles
.DS_Store		.DS_Store
.gitignore		.gitignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
README.md.old		README.md.old
bun.lockb		bun.lockb
components.json		components.json
flake.lock		flake.lock
flake.nix		flake.nix
index.html		index.html
migration_example.sql		migration_example.sql
notes.md		notes.md
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
todo.md		todo.md
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama Grid Search: Instantly Evaluate Multiple LLMs and Prompts.

Table of Contents

Installation

Features

Grid Search (or something similar...)

A/B Testing

Prompt Archive

Experiment Logs

Future Features

Contributing

Development

Citations

Thank you!

About

Releases 19

Contributors 5

Languages

License

dezoito/ollama-grid-search

Folders and files

Latest commit

History

Repository files navigation

Ollama Grid Search: Instantly Evaluate Multiple LLMs and Prompts.

Table of Contents

Installation

Features

Grid Search (or something similar...)

A/B Testing

Prompt Archive

Experiment Logs

Future Features

Contributing

Development

Citations

Thank you!

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 19

Contributors 5

Languages