Skip to content

Commit

Permalink
Added docs.
Browse files Browse the repository at this point in the history
  • Loading branch information
JoelNiklaus committed Dec 12, 2024
1 parent c177a8e commit d712cdb
Show file tree
Hide file tree
Showing 3 changed files with 135 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
title: Add a custom task
- local: adding-a-new-metric
title: Add a custom metric
- local: evaluating-a-custom-model
title: Evaluate a custom model
- local: use-vllm-as-backend
title: Use VLLM as backend
- local: evaluate-the-model-on-a-server-or-container
Expand Down
129 changes: 129 additions & 0 deletions docs/source/evaluating-a-custom-model.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Evaluating a Custom Model

Lighteval allows you to evaluate custom model implementations by creating a custom model class that inherits from `LightevalModel`. This is useful when you want to evaluate models that aren't directly supported by the standard backends (transformers, vllm, etc).

## Creating a Custom Model

1. Create a Python file containing your custom model implementation. The model must inherit from `LightevalModel` and implement all required methods.

Here's a basic example:

```python
from lighteval.models.abstract_model import LightevalModel

class MyCustomModel(LightevalModel):
def __init__(self, config, env_config):
super().__init__(config, env_config)
# Initialize your model here...

def greedy_until(self, requests, max_tokens=None, stop_sequences=None):
# Implement generation logic
pass

def loglikelihood(self, requests, log=True):
# Implement loglikelihood computation
pass

def loglikelihood_rolling(self, requests):
# Implement rolling loglikelihood computation
pass

def loglikelihood_single_token(self, requests):
# Implement single token loglikelihood computation
pass
```

2. The custom model file should contain exactly one class that inherits from `LightevalModel`. This class will be automatically detected and instantiated when loading the model.

> [!TIP]
> You can find a complete example of a custom model implementation in `examples/custom_models/google_translate_model.py`.
## Running the Evaluation

You can evaluate your custom model using either the command line interface or the Python API.

### Using the Command Line

```bash
lighteval custom \
"google-translate" \
"examples/custom_models/google_translate_model.py" \
"leaderboard|wmt20:fr-de|0|0" \
--output-dir results \
--max-samples 10
```

The command takes three required arguments:
- The model name (used for tracking in results/logs)
- The path to your model implementation file
- The tasks to evaluate on (same format as other backends)

### Using the Python API

```python
from lighteval.logging.evaluation_tracker import EvaluationTracker
from lighteval.models.custom.custom_model import CustomModelConfig
from lighteval.pipeline import Pipeline, PipelineParameters, EnvConfig

# Set up evaluation tracking
evaluation_tracker = EvaluationTracker(
output_dir="results",
save_details=True
)

# Configure the pipeline
pipeline_params = PipelineParameters(
launcher_type=ParallelismManager.CUSTOM,
env_config=EnvConfig(cache_dir="tmp/")
)

# Configure your custom model
model_config = CustomModelConfig(
model="my-custom-model",
model_definition_file_path="path/to/my_model.py"
)

# Create and run the pipeline
pipeline = Pipeline(
tasks="leaderboard|truthfulqa:mc|0|0",
pipeline_parameters=pipeline_params,
evaluation_tracker=evaluation_tracker,
model_config=model_config
)

pipeline.evaluate()
pipeline.save_and_push_results()
```

## Required Methods

Your custom model must implement these core methods:

- `greedy_until`: For generating text until a stop sequence or max tokens is reached
- `loglikelihood`: For computing log probabilities of specific continuations
- `loglikelihood_rolling`: For computing rolling log probabilities of sequences
- `loglikelihood_single_token`: For computing log probabilities of single tokens

See the `LightevalModel` base class documentation for detailed method signatures and requirements.

## Best Practices

1. **Error Handling**: Implement robust error handling in your model methods to gracefully handle edge cases.

2. **Batching**: Consider implementing efficient batching in your model methods to improve performance.

3. **Resource Management**: Properly manage any resources (e.g., API connections, model weights) in your model's `__init__` and `__del__` methods.

4. **Documentation**: Add clear docstrings to your model class and methods explaining any specific requirements or limitations.

## Example Use Cases

Custom models are particularly useful for:

- Evaluating models accessed through custom APIs
- Wrapping models with specialized preprocessing/postprocessing
- Testing novel model architectures
- Evaluating ensemble models
- Integrating with external services or tools

For a complete example of a custom model that wraps the Google Translate API, see `examples/custom_models/google_translate_model.py`.
4 changes: 4 additions & 0 deletions docs/source/package_reference/models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@
[[autodoc]] models.endpoints.tgi_model.TGIModelConfig
[[autodoc]] models.endpoints.tgi_model.ModelClient

### Custom Model
[[autodoc]] models.custom.custom_model.CustomModelConfig
[[autodoc]] models.custom.custom_model.CustomModel

### Open AI Models
[[autodoc]] models.endpoints.openai_model.OpenAIClient

Expand Down

0 comments on commit d712cdb

Please sign in to comment.