Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DecisionTreeRegressor #30

Merged
merged 30 commits into from
Sep 20, 2023
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
23131aa
DecisionTreeRegressor added to pymilo param
AHReccese Aug 17, 2023
31a9b87
decision tree chain initialized
AHReccese Sep 2, 2023
1dc7953
add decision tree support to `pymilo_func.py`
AHReccese Sep 2, 2023
3c62e03
decision tree added to exported path dict
AHReccese Sep 8, 2023
5065844
numpy float64 added to pymilo np dict
AHReccese Sep 8, 2023
9bba58a
decision tree generated files gitignored
AHReccese Sep 8, 2023
839f874
decision tree regression test added
AHReccese Sep 8, 2023
16ea4a0
decision tree's tests runner added
AHReccese Sep 8, 2023
98801ca
decision tree support added to `test_pymilo.py`
AHReccese Sep 8, 2023
ad95210
report status log punctuationally enhanced
AHReccese Sep 8, 2023
40ff94d
tree transporter(serializer + deserializer) added
AHReccese Sep 8, 2023
b0e31f7
tree transporter added to tree chain
AHReccese Sep 8, 2023
168897a
tree transporter's docstrings enhanced.
AHReccese Sep 8, 2023
7e7d27a
update pymilo param to handle scikit 1.3.0 tree
AHReccese Sep 8, 2023
6c2dc35
generalizing `field-names` to cover wide range of scikit tree impleme…
AHReccese Sep 8, 2023
20cdf98
fix windows 3.6 c_size_t issue
AHReccese Sep 8, 2023
b3cee2c
docstring added + logging
AHReccese Sep 8, 2023
63c9df8
trim the os.name output
AHReccese Sep 8, 2023
144ff99
fix windows 3.6 issue for cinit input types
AHReccese Sep 8, 2023
33d5492
fix windows 3.6 issue for cinit input types
AHReccese Sep 8, 2023
4381d25
fix windows 3.6 issue for cinit input types
AHReccese Sep 8, 2023
f7a70ba
fix windows 3.6 issue for cinit input types
AHReccese Sep 8, 2023
dc76058
fix windows 3.6 issue for cinit input types
AHReccese Sep 8, 2023
8b53784
Update `CHANGELOG.MD`
AHReccese Sep 14, 2023
36f1ce7
refactor decision_tree_regression test
AHReccese Sep 14, 2023
d688491
apply arash comments
AHReccese Sep 14, 2023
a7ad3a8
autopep8 run
AHReccese Sep 14, 2023
08ed14d
apply arash secondary comments
AHReccese Sep 14, 2023
15e61ed
apply sepand's secondary minor comments
AHReccese Sep 18, 2023
2ddca6b
doc : minor edit in CHANGELOG.md
sepandhaghighi Sep 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -101,4 +101,5 @@ gen

/tests/exported_linear_models
/tests/exported_neural_networks
/tests/exported_decision_trees
/.VSCodeCounter
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,18 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [Unreleased]
### Added
- scikit-learn decision tree models
- `DecisionTreeRegressor` model
- `Tree` Transporter
- Decision Tree chain
- `decision_tree_chain.py`
- `DecisionTreeRegressor` Test
sepandhaghighi marked this conversation as resolved.
Show resolved Hide resolved
### Changed
- Tests config modified
- DecisionTree params initialized in `pymilo_param`
- Decision Tree support added `pymilo_func.py`
sepandhaghighi marked this conversation as resolved.
Show resolved Hide resolved

## [0.2] - 2023-08-02
### Added
- scikit-learn neural network models
Expand Down
136 changes: 136 additions & 0 deletions pymilo/chains/decision_tree_chain.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# -*- coding: utf-8 -*-
"""PyMilo chain for decision trees."""
from ..transporters.transporter import Command

from ..transporters.general_data_structure_transporter import GeneralDataStructureTransporter
from ..transporters.tree_transporter import TreeTransporter

from ..pymilo_param import SKLEARN_DECISION_TREE_TABLE

from ..exceptions.serialize_exception import PymiloSerializationException, SerilaizatoinErrorTypes
from ..exceptions.deserialize_exception import PymiloDeserializationException, DeSerilaizatoinErrorTypes
from traceback import format_exc


DECISION_TREE_CHAIN = {
"GeneralDataStructureTransporter": GeneralDataStructureTransporter(),
"TreeTransporter": TreeTransporter(),
}


def is_decision_tree(model):
"""
Check if the input model is a sklearn's decision tree.

:param model: is a string name of a decision tree or a sklearn object of it
:type model: any object
:return: check result as bool
"""
if isinstance(model, str):
return model in SKLEARN_DECISION_TREE_TABLE.keys()
else:
return type(model) in SKLEARN_DECISION_TREE_TABLE.values()


def transport_decision_tree(request, command):
"""
Return the transported (Serialized or Deserialized) model.

:param request: given decision tree model to be transported
:type request: any object
:param command: command to specify whether the request should be serialized or deserialized
:type command: transporter.Command
:return: the transported request as a json string or sklearn decision tree model
"""
_validate_input(request, command)

if command == Command.SERIALIZE:
try:
return serialize_decision_tree(request)
except Exception as e:
raise PymiloSerializationException(

Check warning on line 51 in pymilo/chains/decision_tree_chain.py

View check run for this annotation

Codecov / codecov/patch

pymilo/chains/decision_tree_chain.py#L50-L51

Added lines #L50 - L51 were not covered by tests
{
'error_type': SerilaizatoinErrorTypes.VALID_MODEL_INVALID_INTERNAL_STRUCTURE,
'error': {
'Exception': repr(e),
'Traceback': format_exc(),
},
'object': request,
})

elif command == Command.DESERIALZIE:
try:
return deserialize_decision_tree(request)
except Exception as e:
raise PymiloDeserializationException(

Check warning on line 65 in pymilo/chains/decision_tree_chain.py

View check run for this annotation

Codecov / codecov/patch

pymilo/chains/decision_tree_chain.py#L64-L65

Added lines #L64 - L65 were not covered by tests
{
'error_type': SerilaizatoinErrorTypes.VALID_MODEL_INVALID_INTERNAL_STRUCTURE,
'error': {
'Exception': repr(e),
'Traceback': format_exc()},
'object': request})


def serialize_decision_tree(decision_tree_object):
"""
Return the serialized json string of the given decision tree model.

:param decision_tree_object: given model to be get serialized
:type decision_tree_object: any sklearn decision tree model
:return: the serialized json string of the given decision tree model
"""
for transporter in DECISION_TREE_CHAIN.keys():
DECISION_TREE_CHAIN[transporter].transport(
decision_tree_object, Command.SERIALIZE)
return decision_tree_object.__dict__


def deserialize_decision_tree(decision_tree):
"""
Return the associated sklearn decision tree model of the given decision_tree.

:param decision_tree: given json string of a decision tree model to get deserialized to associated sklearn decision tree model
:type decision_tree: obj
:return: associated sklearn decision tree model
"""
raw_model = SKLEARN_DECISION_TREE_TABLE[decision_tree.type]()
data = decision_tree.data

for transporter in DECISION_TREE_CHAIN.keys():
DECISION_TREE_CHAIN[transporter].transport(
decision_tree, Command.DESERIALZIE)
for item in data.keys():
setattr(raw_model, item, data[item])
return raw_model


def _validate_input(model, command):
"""
Check if the provided inputs are valid in relation to each other.

:param model: a sklearn decision tree model or a json string of it, serialized through the pymilo export.
:type model: obj
:param command: command to specify whether the request should be serialized or deserialized
:type command: transporter.Command
:return: None
"""
if command == Command.SERIALIZE:
if is_decision_tree(model):
return
else:
raise PymiloSerializationException(

Check warning on line 121 in pymilo/chains/decision_tree_chain.py

View check run for this annotation

Codecov / codecov/patch

pymilo/chains/decision_tree_chain.py#L121

Added line #L121 was not covered by tests
{
'error_type': SerilaizatoinErrorTypes.INVALID_MODEL,
'object': model
}
)
elif command == Command.DESERIALZIE:
if is_decision_tree(model.type):
return
else:
raise PymiloDeserializationException(

Check warning on line 131 in pymilo/chains/decision_tree_chain.py

View check run for this annotation

Codecov / codecov/patch

pymilo/chains/decision_tree_chain.py#L131

Added line #L131 was not covered by tests
{
'error_type': DeSerilaizatoinErrorTypes.INVALID_MODEL,
'object': model
}
)
3 changes: 1 addition & 2 deletions pymilo/chains/linear_model_chain.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,8 +151,7 @@ def validate_input(model, command, is_inner_model):
"""
Check if the provided inputs are valid in relation to each other.

:param model: given object to gets transported, whether a sklearn linear model to get serialized
or a json string of a linear model to get deserialized to associated sklearn linear model
:param model: a sklearn linear model or a json string of it, serialized through the pymilo export.
:type model: obj
:param command: command to specify whether the request should be serialized or deserialized
:type command: transporter.Command
Expand Down
3 changes: 1 addition & 2 deletions pymilo/chains/neural_network_chain.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,7 @@ def _validate_input(model, command):
"""
Check if the provided inputs are valid in relation to each other.

:param model: given object to gets transported, whether a sklearn neural network model to get serialized
or a json string of a neural network model to get deserialized to associated sklearn neural network model
:param model: a sklearn neural network model or a json string of it, serialized through the pymilo export.
:type model: obj
:param command: command to specify whether the request should be serialized or deserialized
:type command: transporter.Command
Expand Down
5 changes: 5 additions & 0 deletions pymilo/pymilo_func.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

from .chains.linear_model_chain import transport_linear_model, is_linear_model
from .chains.neural_network_chain import transport_neural_network, is_neural_network
from .chains.decision_tree_chain import transport_decision_tree, is_decision_tree

from .transporters.transporter import Command

Expand All @@ -30,6 +31,8 @@ def get_sklearn_data(model):
return transport_linear_model(model, Command.SERIALIZE)
elif is_neural_network(model):
return transport_neural_network(model, Command.SERIALIZE)
elif is_decision_tree(model):
return transport_decision_tree(model, Command.SERIALIZE)
else:
return None

Expand All @@ -46,6 +49,8 @@ def to_sklearn_model(import_obj):
return transport_linear_model(import_obj, Command.DESERIALZIE)
elif is_neural_network(import_obj.type):
return transport_neural_network(import_obj, Command.DESERIALZIE)
elif is_decision_tree(import_obj.type):
return transport_decision_tree(import_obj, Command.DESERIALZIE)
else:
return None

Expand Down
19 changes: 16 additions & 3 deletions pymilo/pymilo_param.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,15 @@
from sklearn.neural_network import MLPClassifier
from sklearn.neural_network import BernoulliRBM

from sklearn.tree import DecisionTreeRegressor

from numpy import int64
from numpy import int32
from numpy import float64
from numpy import inf
from numpy import uint8

from sklearn.preprocessing import LabelBinarizer
import numpy as np

PYMILO_VERSION = "0.2"
NOT_SUPPORTED = "NOT_SUPPORTED"
Expand Down Expand Up @@ -119,6 +124,11 @@
"MLPClassifier": MLPClassifier,
"BernoulliRBM": BernoulliRBM,
}

SKLEARN_DECISION_TREE_TABLE = {
"DecisionTreeRegressor": DecisionTreeRegressor
}

KEYS_NEED_PREPROCESSING_BEFORE_DESERIALIZATION = {
"_label_binarizer": LabelBinarizer, # in Ridge Classifier
"active_": int32, # in Lasso Lars
Expand All @@ -132,10 +142,13 @@
NUMPY_TYPE_DICT = {
"numpy.int32": int32,
"numpy.int64": int64,
"numpy.infinity": lambda _: np.inf
"numpy.float64": float64,
"numpy.infinity": lambda _: inf,
"numpy.uint8": uint8,
}

EXPORTED_MODELS_PATH = {
"LINEAR_MODEL": "exported_linear_models",
"NEURAL_NETWORK": "exported_neural_networks"
"NEURAL_NETWORK": "exported_neural_networks",
"DECISION_TREE": "exported_decision_trees"
}
112 changes: 112 additions & 0 deletions pymilo/transporters/tree_transporter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# -*- coding: utf-8 -*-
"""PyMilo SGDOptimizer object transporter."""
sepandhaghighi marked this conversation as resolved.
Show resolved Hide resolved
from sklearn.tree._tree import Tree

from .transporter import AbstractTransporter
from .general_data_structure_transporter import GeneralDataStructureTransporter
from ..pymilo_param import NUMPY_TYPE_DICT

import numpy as np
import platform


class TreeTransporter(AbstractTransporter):
"""Customized PyMilo Transporter developed to handle (pyi,pyx) Tree object."""

def serialize(self, data, key, model_type):
"""
Serialize instances of the Tree class.

Record the n_features, n_classes and n_outputs fields of tree object.
sepandhaghighi marked this conversation as resolved.
Show resolved Hide resolved

:param data: the internal data dictionary of the given model
:type data: dict
:param key: the special key of the data param, which we're going to serialize its value(data[key])
:type key: object
:param model_type: the model type of the ML model
:type model_type: str
:return: pymilo serialized output of data[key]
"""
if isinstance(data[key], Tree):
gdst = GeneralDataStructureTransporter()
tree = data[key]
tree_inner_state = tree.__getstate__()

data[key] = {
'params': {
'internal_state': {
"max_depth": tree_inner_state["max_depth"],
"node_count": tree_inner_state["node_count"],
"nodes": {
"types": [str(np.dtype(i).name) for i in tree_inner_state["nodes"][0]],
"field-names": list(tree_inner_state["nodes"][0].dtype.names),
"values": [node.tolist() for node in tree_inner_state["nodes"]],
},
"values": gdst.ndarray_to_list(tree_inner_state["values"]),
},
'n_features': tree.n_features,
'n_classes': gdst.ndarray_to_list(tree.n_classes),
'n_outputs': tree.n_outputs,
}
}

return data[key]

def deserialize(self, data, key, model_type):
"""
Deserialize the special tree_ field of the Decision Trees.

The associated tree_ field of the pymilo serialized model, is extracted through
it's previously serialized parameters.
deserialize the data[key] of the given model which type is model_type.
basically in order to fully deserialize a model, we should traverse over all the keys of its serialized data dictionary and
pass it through the chain of associated transporters to get fully deserialized.

sepandhaghighi marked this conversation as resolved.
Show resolved Hide resolved
:param data: the internal data dictionary of the associated JSON file of the ML model generated by pymilo export.
:type data: dict
:param key: the special key of the data param, which we're going to deserialize its value(data[key])
:type key: object
:param model_type: the model type of the ML model
:type model_type: str
:return: pymilo deserialized output of data[key]
"""
content = data[key]

if (key == "tree_" and (model_type == "DecisionTreeRegressor")):
gdst = GeneralDataStructureTransporter()
tree_params = content['params']

tree_internal_state = tree_params["internal_state"]

nodes_dtype_spec = []
for i in range(len(tree_internal_state["nodes"]["types"])):
nodes_dtype_spec.append(
(tree_internal_state["nodes"]["field-names"][i], NUMPY_TYPE_DICT["numpy." + tree_internal_state["nodes"]["types"][i]]))
nodes = [tuple(node)
for node in tree_internal_state["nodes"]["values"]]
nodes = np.array(nodes, dtype=nodes_dtype_spec)

tree_internal_state = {
"max_depth": tree_internal_state["max_depth"],
"node_count": tree_internal_state["node_count"],
"nodes": nodes,
"values": gdst.list_to_ndarray(tree_internal_state["values"]),
}

n_classes = np.ndarray(
shape=(np.intp(len(tree_params["n_classes"])),), dtype=np.intp)
for i in range(len(n_classes)):
n_classes[i] = tree_params["n_classes"][i]

_tree = Tree(
tree_params["n_features"],
n_classes,
tree_params["n_outputs"]
)

_tree.__setstate__(tree_internal_state)

return _tree

else:
return content
7 changes: 5 additions & 2 deletions pymilo/utils/test_pymilo.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

from ..chains.linear_model_chain import is_linear_model
from ..chains.neural_network_chain import is_neural_network
from ..chains.decision_tree_chain import is_decision_tree

from ..pymilo_param import EXPORTED_MODELS_PATH

Expand All @@ -29,6 +30,8 @@
model_type = "LINEAR_MODEL"
elif is_neural_network(model):
model_type = "NEURAL_NETWORK"
elif is_decision_tree(model):
model_type = "DECISION_TREE"
else:
model_type = None
return EXPORTED_MODELS_PATH[model_type]
Expand Down Expand Up @@ -130,6 +133,6 @@
:return: None
"""
if result:
print('Pymilo Test for Model:' + model_name + ' succeed.')
print('Pymilo Test for Model: ' + model_name + ' succeed.')
else:
print('Pymilo Test for Model:' + model_name + ' failed.')
print('Pymilo Test for Model: ' + model_name + ' failed.')

Check warning on line 138 in pymilo/utils/test_pymilo.py

View check run for this annotation

Codecov / codecov/patch

pymilo/utils/test_pymilo.py#L138

Added line #L138 was not covered by tests
Loading
Loading