-
Notifications
You must be signed in to change notification settings - Fork 113
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c177a8e
commit d712cdb
Showing
3 changed files
with
135 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
# Evaluating a Custom Model | ||
|
||
Lighteval allows you to evaluate custom model implementations by creating a custom model class that inherits from `LightevalModel`. This is useful when you want to evaluate models that aren't directly supported by the standard backends (transformers, vllm, etc). | ||
|
||
## Creating a Custom Model | ||
|
||
1. Create a Python file containing your custom model implementation. The model must inherit from `LightevalModel` and implement all required methods. | ||
|
||
Here's a basic example: | ||
|
||
```python | ||
from lighteval.models.abstract_model import LightevalModel | ||
|
||
class MyCustomModel(LightevalModel): | ||
def __init__(self, config, env_config): | ||
super().__init__(config, env_config) | ||
# Initialize your model here... | ||
|
||
def greedy_until(self, requests, max_tokens=None, stop_sequences=None): | ||
# Implement generation logic | ||
pass | ||
|
||
def loglikelihood(self, requests, log=True): | ||
# Implement loglikelihood computation | ||
pass | ||
|
||
def loglikelihood_rolling(self, requests): | ||
# Implement rolling loglikelihood computation | ||
pass | ||
|
||
def loglikelihood_single_token(self, requests): | ||
# Implement single token loglikelihood computation | ||
pass | ||
``` | ||
|
||
2. The custom model file should contain exactly one class that inherits from `LightevalModel`. This class will be automatically detected and instantiated when loading the model. | ||
|
||
> [!TIP] | ||
> You can find a complete example of a custom model implementation in `examples/custom_models/google_translate_model.py`. | ||
## Running the Evaluation | ||
|
||
You can evaluate your custom model using either the command line interface or the Python API. | ||
|
||
### Using the Command Line | ||
|
||
```bash | ||
lighteval custom \ | ||
"google-translate" \ | ||
"examples/custom_models/google_translate_model.py" \ | ||
"leaderboard|wmt20:fr-de|0|0" \ | ||
--output-dir results \ | ||
--max-samples 10 | ||
``` | ||
|
||
The command takes three required arguments: | ||
- The model name (used for tracking in results/logs) | ||
- The path to your model implementation file | ||
- The tasks to evaluate on (same format as other backends) | ||
|
||
### Using the Python API | ||
|
||
```python | ||
from lighteval.logging.evaluation_tracker import EvaluationTracker | ||
from lighteval.models.custom.custom_model import CustomModelConfig | ||
from lighteval.pipeline import Pipeline, PipelineParameters, EnvConfig | ||
|
||
# Set up evaluation tracking | ||
evaluation_tracker = EvaluationTracker( | ||
output_dir="results", | ||
save_details=True | ||
) | ||
|
||
# Configure the pipeline | ||
pipeline_params = PipelineParameters( | ||
launcher_type=ParallelismManager.CUSTOM, | ||
env_config=EnvConfig(cache_dir="tmp/") | ||
) | ||
|
||
# Configure your custom model | ||
model_config = CustomModelConfig( | ||
model="my-custom-model", | ||
model_definition_file_path="path/to/my_model.py" | ||
) | ||
|
||
# Create and run the pipeline | ||
pipeline = Pipeline( | ||
tasks="leaderboard|truthfulqa:mc|0|0", | ||
pipeline_parameters=pipeline_params, | ||
evaluation_tracker=evaluation_tracker, | ||
model_config=model_config | ||
) | ||
|
||
pipeline.evaluate() | ||
pipeline.save_and_push_results() | ||
``` | ||
|
||
## Required Methods | ||
|
||
Your custom model must implement these core methods: | ||
|
||
- `greedy_until`: For generating text until a stop sequence or max tokens is reached | ||
- `loglikelihood`: For computing log probabilities of specific continuations | ||
- `loglikelihood_rolling`: For computing rolling log probabilities of sequences | ||
- `loglikelihood_single_token`: For computing log probabilities of single tokens | ||
|
||
See the `LightevalModel` base class documentation for detailed method signatures and requirements. | ||
|
||
## Best Practices | ||
|
||
1. **Error Handling**: Implement robust error handling in your model methods to gracefully handle edge cases. | ||
|
||
2. **Batching**: Consider implementing efficient batching in your model methods to improve performance. | ||
|
||
3. **Resource Management**: Properly manage any resources (e.g., API connections, model weights) in your model's `__init__` and `__del__` methods. | ||
|
||
4. **Documentation**: Add clear docstrings to your model class and methods explaining any specific requirements or limitations. | ||
|
||
## Example Use Cases | ||
|
||
Custom models are particularly useful for: | ||
|
||
- Evaluating models accessed through custom APIs | ||
- Wrapping models with specialized preprocessing/postprocessing | ||
- Testing novel model architectures | ||
- Evaluating ensemble models | ||
- Integrating with external services or tools | ||
|
||
For a complete example of a custom model that wraps the Google Translate API, see `examples/custom_models/google_translate_model.py`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters