Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input and output validation #104

Merged
merged 34 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
dae5794
Refactor variable definitions to support validation
t-bz Dec 6, 2024
24655a2
Update variable tests
t-bz Dec 6, 2024
768a440
Fix utils tests
t-bz Dec 6, 2024
3e80ceb
Fix base tests
t-bz Dec 6, 2024
370362d
Fix torch_model tests
t-bz Dec 6, 2024
708b9e5
Fix torch_module tests
t-bz Dec 6, 2024
b817d44
Add validation to models
t-bz Dec 6, 2024
47ea12a
Add validation tests
t-bz Dec 6, 2024
c9dd02c
Clean up conftest
t-bz Dec 6, 2024
6fbd75a
Remove SerializeAsAny annotation since variable serialization is expl…
t-bz Dec 6, 2024
297e2da
make default value required
pluflou Dec 10, 2024
50bd4d3
add single and double precision support in variable class
pluflou Dec 10, 2024
892de33
add support for numpy floats
pluflou Dec 10, 2024
6990180
add validation for input_dict and support for precision setting in mo…
pluflou Dec 12, 2024
0f786d0
make input dict validation strict
pluflou Dec 12, 2024
2c0eaab
catch bools in torch tensor inputs
pluflou Dec 12, 2024
eb8425d
drop np.float32 until we have a use-case
pluflou Dec 12, 2024
dc08f80
add dynamic checking for default vals, and strict flag for range chec…
pluflou Dec 12, 2024
629b77d
make type casting more consistent during input validation
pluflou Dec 14, 2024
bd322f7
make default required for inputs only and validate in base class
pluflou Dec 14, 2024
9552dd9
fix range validation tests
pluflou Dec 16, 2024
d2c2da6
remove range check within tolerance for now
pluflou Dec 16, 2024
fa508eb
add is_constant flag and default range, fix unit tests
pluflou Dec 16, 2024
b6aaddf
update example nbs
pluflou Dec 16, 2024
cbb67e1
update example notebooks
pluflou Dec 18, 2024
e409459
add nicer onnx graphs
pluflou Dec 18, 2024
e6c7b6a
simplify validation config
pluflou Dec 18, 2024
45dfde8
Merge branch 'main' of https://github.com/slaclab/lume-model into val…
pluflou Dec 18, 2024
cba3073
fix tests after adjusting config validation format
pluflou Dec 19, 2024
2de7dd2
remove setting torch default dtype
pluflou Dec 19, 2024
6c83403
add some tests
pluflou Dec 20, 2024
b10521a
adjust docstrings
pluflou Dec 20, 2024
6bb2e07
update README
pluflou Dec 20, 2024
f34926f
reset precision to double to fix tests
pluflou Dec 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
245 changes: 136 additions & 109 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,17 +29,20 @@ The lume-model variables are intended to enforce requirements for input and outp
Minimal example of scalar input and output variables:

```python
from lume_model.variables import ScalarInputVariable, ScalarOutputVariable
from lume_model.variables import ScalarVariable

input_variable = ScalarInputVariable(
input_variable = ScalarVariable(
name="example_input",
default=0.1,
default_value=0.1,
value_range=[0.0, 1.0],
)
output_variable = ScalarOutputVariable(name="example_output")
output_variable = ScalarVariable(name="example_output")
```

All input variables may be made into constants by passing the `is_constant=True` keyword argument. Value assingments on these constant variables will raise an error message.
All input variables may be made into constants by passing the
`is_constant=True` keyword argument. These constant variables are always
set to their default value and any other value assignments on
them will raise an error message.

## Models

Expand All @@ -49,7 +52,8 @@ Requirements for model classes:

* input_variables: A list defining the input variables for the model. Variable names must be unique. Required for use with lume-epics tools.
* output_variables: A list defining the output variables for the model. Variable names must be unique. Required for use with lume-epics tools.
* evaluate: The evaluate method is called by the serving model. Subclasses must implement this method, accepting and returning a dictionary.
* _evaluate: The evaluate method is called by the serving model.
Subclasses must implement this method, accepting and returning a dictionary.

Example model implementation and instantiation:

Expand All @@ -59,7 +63,7 @@ from lume_model.variables import ScalarInputVariable, ScalarOutputVariable


class ExampleModel(LUMEBaseModel):
def evaluate(self, input_dict):
def _evaluate(self, input_dict):
output_dict = {
"output1": input_dict[self.input_variables[0].name] ** 2,
"output2": input_dict[self.input_variables[1].name] ** 2,
Expand Down Expand Up @@ -95,18 +99,18 @@ For example, `m.dump("example_model.yml")` writes the following to file
model_class: ExampleModel
input_variables:
input1:
variable_type: scalar
default: 0.1
variable_class: ScalarVariable
default_value: 0.1
is_constant: false
value_range: [0.0, 1.0]
input2:
variable_type: scalar
default: 0.2
variable_class: ScalarVariable
default_value: 0.2
is_constant: false
value_range: [0.0, 1.0]
output_variables:
output1: {variable_type: scalar}
output2: {variable_type: scalar}
output1: {variable_class: ScalarVariable}
output2: {variable_class: ScalarVariable}
```

and can be loaded by simply passing the file to the model constructor:
Expand All @@ -116,7 +120,7 @@ from lume_model.base import LUMEBaseModel


class ExampleModel(LUMEBaseModel):
def evaluate(self, input_dict):
def _evaluate(self, input_dict):
output_dict = {
"output1": input_dict[self.input_variables[0].name] ** 2,
"output2": input_dict[self.input_variables[1].name] ** 2,
Expand All @@ -129,115 +133,138 @@ m = ExampleModel("example_model.yml")

## PyTorch Toolkit

In the same way as the KerasModel, a PyTorchModel can also be loaded using the `lume_model.utils.model_from_yaml` method, specifying `PyTorchModel` in the `model_class` of the configuration file.
A TorchModel can also be loaded from a YAML, specifying `TorchModel` in
the `model_class` of the configuration file.

```yaml
model:
kwargs:
model_file: /path/to/california_regression.pt
model_class: lume_model.torch.PyTorchModel
model_info: path/to/model_info.json
output_format:
type: tensor
requirements:
torch: 1.12
model_class: TorchModel
model: model.pt
output_format: tensor
device: cpu
fixed_model: true
```

In addition to the model_class, we also specify the path to the
PyTorch model and the transformers (saved using `torch.save()`).

The `output_format` specification indicates which form the outputs
of the model's `evaluate()` function should take, which may vary
depending on the application. PyTorchModels working with the
[LUME-EPICS](https://github.com/slaclab/lume-epics) service will
require an `OutputVariable` type, while [Xopt](https://github.
com/xopt-org/Xopt) requires either a dictionary of float
values or tensors as output.

The variables and any transformers can also be added to the YAML
configuration file:

```yaml
model_class: TorchModel
input_variables:
input1:
variable_class: ScalarVariable
default_value: 0.1
value_range: [0.0, 1.0]
is_constant: false
input2:
variable_class: ScalarVariable
default_value: 0.2
value_range: [0.0, 1.0]
is_constant: false
output_variables:
output:
variable_class: ScalarVariable
value_range: [-.inf, .inf]
is_constant: false
input_validation_config: null
output_validation_config: null
model: model.pt
input_transformers: [input_transformers_0.pt]
output_transformers: [output_transformers_0.pt]
output_format: tensor
device: cpu
fixed_model: true
precision: double
```

In addition to the model_class, we also specify the path to the pytorch model (saved using `torch.save()`) and additional information about the model through the `model_info.json` file such as the order of the feature names and outputs of the model:

```json
{
"train_input_mins": [
0.4999000132083893,
...
-124.3499984741211
],
"train_input_maxs": [
15.000100135803223,
...
-114.30999755859375
],
"model_in_list": [
"MedInc",
...
"Longitude"
],
"model_out_list": [
"MedHouseVal"
],
"loc_in": {
"MedInc": 0,
...
"Longitude": 7
},
"loc_out": {
"MedHouseVal": 0
}
}
The TorchModel can then be loaded:

```python
from lume_model.torch_model import TorchModel

# Load the model from a YAML file
torch_model = TorchModel("path/to/model_config.yml")
```

The `output_format` specification indicates which form the outputs of the model's `evaluate()` function should take, which may vary depending on the application. PyTorchModels working with the [LUME-EPICS](https://github.com/slaclab/lume-epics) service will require an `OutputVariable` type, while [Xopt](https://github.com/ChristopherMayes/Xopt) requires either a dictionary of float values or tensors as output.

It is important to note that currently the **transformers are not loaded** into the model when using the `model_from_yaml` method. These need to be created separately and added either:
## TorchModule Usage

The `TorchModule` wrapper around the `TorchModel` is used to provide
a consistent API with PyTorch, making it easier to integrate with
other PyTorch-based tools and workflows.

* to the model's `kwargs` before instantiating
### Initialization

To initialize a `TorchModule`, you need to provide the TorchModel object
or a YAML file containing the TorchModule model configuration.

```python
import torch
import json
from lume_model.torch import PyTorchModel

# load the model class and kwargs
with open(f"california_variables.yml","r") as f:
yaml_model, yaml_kwargs = model_from_yaml(f, load_model=False)

# construct the transformers
with open("normalization.json", "r") as f:
normalizations = json.load(f)

input_transformer = AffineInputTransform(
len(normalizations["x_mean"]),
coefficient=torch.tensor(normalizations["x_scale"]),
offset=torch.tensor(normalizations["x_mean"]),
)
output_transformer = AffineInputTransform(
len(normalizations["y_mean"]),
coefficient=torch.tensor(normalizations["y_scale"]),
offset=torch.tensor(normalizations["y_mean"]),
)
# Wrap in TorchModule
torch_module = TorchModule(model=torch_model)

# Or load the model configuration from a YAML file
torch_module = TorchModule("path/to/module_config.yml")
```

### Model Configuration

model_kwargs["input_transformers"] = [input_transformer]
model_kwargs["output_transformers"] = [output_transformer]
The YAML configuration file should specify the `TorchModule` class
as well as the `TorchModel` configuration:

model = PyTorchModel(**model_kwargs)
```yaml
model_class: TorchModule
input_order: [input1, input2]
output_order: [output]
model:
model_class: TorchModel
input_variables:
input1:
variable_class: ScalarVariable
default_value: 0.1
value_range: [0.0, 1.0]
is_constant: false
input2:
variable_class: ScalarVariable
default_value: 0.2
value_range: [0.0, 1.0]
is_constant: false
output_variables:
output:
variable_class: ScalarVariable
model: model.pt
output_format: tensor
device: cpu
fixed_model: true
precision: double
```

* using the setters for the transformer attributes in the model.
### Using the Model

Once the `TorchModule` is initialized, you can use it just like a
regular PyTorch model. You can pass tensor-type inputs to the model and
get tensor-type outputs.

```python
# load the model
with open("california_variables.yml", "r") as f:
model = model_from_yaml(f, load_model=True)

# construct the transformers
with open("normalization.json", "r") as f:
normalizations = json.load(f)

input_transformer = AffineInputTransform(
len(normalizations["x_mean"]),
coefficient=torch.tensor(normalizations["x_scale"]),
offset=torch.tensor(normalizations["x_mean"]),
)
output_transformer = AffineInputTransform(
len(normalizations["y_mean"]),
coefficient=torch.tensor(normalizations["y_scale"]),
offset=torch.tensor(normalizations["y_mean"]),
)
from torch import tensor
from lume_model.torch_module import TorchModule

# use the model's setter to add the transformers. Here we use a tuple
# to tell the setter where in the list the transformer should be inserted.
# In this case because we only have one, we add them at the beginning
# of the lists.
model.input_transformers = (input_transformer, 0)
model.output_transformers = (output_transformer, 0)
```

# Example input tensor
input_data = tensor([[0.1, 0.2]])

# Evaluate the model
output = torch_module(input_data)

# Output will be a tensor
print(output)
```
Loading
Loading