Added docs.

huggingface · Dec 12, 2024 · d712cdb · d712cdb
1 parent c177a8e
commit d712cdb
Show file tree

Hide file tree

Showing 3 changed files with 135 additions and 0 deletions.
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -15,6 +15,8 @@
     title: Add a custom task
   - local: adding-a-new-metric
     title: Add a custom metric
+  - local: evaluating-a-custom-model
+    title: Evaluate a custom model
   - local: use-vllm-as-backend
     title: Use VLLM as backend
   - local: evaluate-the-model-on-a-server-or-container

diff --git a/docs/source/evaluating-a-custom-model.mdx b/docs/source/evaluating-a-custom-model.mdx
@@ -0,0 +1,129 @@
+# Evaluating a Custom Model
+
+Lighteval allows you to evaluate custom model implementations by creating a custom model class that inherits from `LightevalModel`. This is useful when you want to evaluate models that aren't directly supported by the standard backends (transformers, vllm, etc).
+
+## Creating a Custom Model
+
+1. Create a Python file containing your custom model implementation. The model must inherit from `LightevalModel` and implement all required methods.
+
+Here's a basic example:
+
+```python
+from lighteval.models.abstract_model import LightevalModel
+
+class MyCustomModel(LightevalModel):
+    def __init__(self, config, env_config):
+        super().__init__(config, env_config)
+        # Initialize your model here...
+
+    def greedy_until(self, requests, max_tokens=None, stop_sequences=None):
+        # Implement generation logic
+        pass
+
+    def loglikelihood(self, requests, log=True):
+        # Implement loglikelihood computation
+        pass
+
+    def loglikelihood_rolling(self, requests):
+        # Implement rolling loglikelihood computation
+        pass
+
+    def loglikelihood_single_token(self, requests):
+        # Implement single token loglikelihood computation
+        pass
+```
+
+2. The custom model file should contain exactly one class that inherits from `LightevalModel`. This class will be automatically detected and instantiated when loading the model.
+
+> [!TIP]
+> You can find a complete example of a custom model implementation in `examples/custom_models/google_translate_model.py`.
+
+## Running the Evaluation
+
+You can evaluate your custom model using either the command line interface or the Python API.
+
+### Using the Command Line
+
+```bash
+lighteval custom \
+    "google-translate" \
+    "examples/custom_models/google_translate_model.py" \
+    "leaderboard|wmt20:fr-de|0|0" \
+    --output-dir results \
+    --max-samples 10
+```
+
+The command takes three required arguments:
+- The model name (used for tracking in results/logs)
+- The path to your model implementation file
+- The tasks to evaluate on (same format as other backends)
+
+### Using the Python API
+
+```python
+from lighteval.logging.evaluation_tracker import EvaluationTracker
+from lighteval.models.custom.custom_model import CustomModelConfig
+from lighteval.pipeline import Pipeline, PipelineParameters, EnvConfig
+
+# Set up evaluation tracking
+evaluation_tracker = EvaluationTracker(
+    output_dir="results",
+    save_details=True
+)
+
+# Configure the pipeline
+pipeline_params = PipelineParameters(
+    launcher_type=ParallelismManager.CUSTOM,
+    env_config=EnvConfig(cache_dir="tmp/")
+)
+
+# Configure your custom model
+model_config = CustomModelConfig(
+    model="my-custom-model",
+    model_definition_file_path="path/to/my_model.py"
+)
+
+# Create and run the pipeline
+pipeline = Pipeline(
+    tasks="leaderboard|truthfulqa:mc|0|0",
+    pipeline_parameters=pipeline_params,
+    evaluation_tracker=evaluation_tracker,
+    model_config=model_config
+)
+
+pipeline.evaluate()
+pipeline.save_and_push_results()
+```
+
+## Required Methods
+
+Your custom model must implement these core methods:
+
+- `greedy_until`: For generating text until a stop sequence or max tokens is reached
+- `loglikelihood`: For computing log probabilities of specific continuations
+- `loglikelihood_rolling`: For computing rolling log probabilities of sequences
+- `loglikelihood_single_token`: For computing log probabilities of single tokens
+
+See the `LightevalModel` base class documentation for detailed method signatures and requirements.
+
+## Best Practices
+
+1. **Error Handling**: Implement robust error handling in your model methods to gracefully handle edge cases.
+
+2. **Batching**: Consider implementing efficient batching in your model methods to improve performance.
+
+3. **Resource Management**: Properly manage any resources (e.g., API connections, model weights) in your model's `__init__` and `__del__` methods.
+
+4. **Documentation**: Add clear docstrings to your model class and methods explaining any specific requirements or limitations.
+
+## Example Use Cases
+
+Custom models are particularly useful for:
+
+- Evaluating models accessed through custom APIs
+- Wrapping models with specialized preprocessing/postprocessing
+- Testing novel model architectures
+- Evaluating ensemble models
+- Integrating with external services or tools
+
+For a complete example of a custom model that wraps the Google Translate API, see `examples/custom_models/google_translate_model.py`.
diff --git a/docs/source/package_reference/models.mdx b/docs/source/package_reference/models.mdx
@@ -28,6 +28,10 @@
 [[autodoc]] models.endpoints.tgi_model.TGIModelConfig
 [[autodoc]] models.endpoints.tgi_model.ModelClient
 
+### Custom Model
+[[autodoc]] models.custom.custom_model.CustomModelConfig
+[[autodoc]] models.custom.custom_model.CustomModel
+
 ### Open AI Models
 [[autodoc]] models.endpoints.openai_model.OpenAIClient