Skip to content

Commit

Permalink
DOI
Browse files Browse the repository at this point in the history
  • Loading branch information
jeremiedb committed Jan 25, 2024
1 parent 8c452b6 commit 712a9e8
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 17 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ docs/site/
Manifest.toml

data/
.vscode/
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# EvoTrees <a href="https://evovest.github.io/EvoTrees.jl/dev/"><img src="figures/hex-evotrees-2.png" align="right" height="160"/></a>


| Documentation | CI Status |
| Documentation | CI Status | CI Status |
|:------------------------:|:----------------:|
| [![][docs-stable-img]][docs-stable-url] [![][docs-latest-img]][docs-latest-url] | [![][ci-img]][ci-url] |

Expand Down
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# [EvoTrees.jl](https://github.com/Evovest/EvoTrees.jl)
# EvoTrees.jl

A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram based algorithms with support for multiple loss functions, including various regressions, multi-classification and Gaussian max likelihood.

Expand Down
30 changes: 15 additions & 15 deletions src/MLJ.jl
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ A model type for constructing a EvoTreeRegressor, based on [EvoTrees.jl](https:/
- `:tweedie`
- `:quantile`
- `:l1`
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=100`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
Expand All @@ -173,13 +173,13 @@ A model type for constructing a EvoTreeRegressor, based on [EvoTrees.jl](https:/
- `:l1`: weighting parameters to positive vs negative residuals.
- Positive residual weights = `alpha`
- Negative residual weights = `(1 - alpha)`
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `max_depth=6`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to `2^max_depth`. Typical optimal values are in the 3 to 9 range.
- `min_weight=1.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `nbins=64`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `monotone_constraints=Dict{Int, Int}()`: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).
Only `:linear`, `:logistic`, `:gamma` and `tweedie` losses are supported at the moment.
- `tree_type="binary"` Tree structure to be used. One of:
Expand Down Expand Up @@ -287,19 +287,19 @@ EvoTreeClassifier is used to perform multi-class classification, using cross-ent
# Hyper-parameters
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=100`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain improvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `max_depth=6`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to `2^max_depth`. Typical optimal values are in the 3 to 9 range.
- `min_weight=1.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `nbins=64`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `tree_type="binary"` Tree structure to be used. One of:
- `binary`: Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see `gamma`) stops further node splits.
- `oblivious`: A common splitting condition is imposed to all nodes of a given depth.
Expand Down Expand Up @@ -412,19 +412,19 @@ EvoTreeCount is used to perform Poisson probabilistic regression on count target
# Hyper-parameters
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=100`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `max_depth=6`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.
- `min_weight=1.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `nbins=64`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `monotone_constraints=Dict{Int, Int}()`: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).
- `tree_type="binary"` Tree structure to be used. One of:
- `binary`: Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see `gamma`) stops further node splits.
Expand Down Expand Up @@ -542,19 +542,19 @@ EvoTreeGaussian is used to perform Gaussian probabilistic regression, fitting μ
# Hyper-parameters
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=100`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `max_depth=6`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.
- `min_weight=8.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `nbins=64`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `monotone_constraints=Dict{Int, Int}()`: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).
!Experimental feature: note that for Gaussian regression, constraints may not be enforce systematically.
- `tree_type="binary"` Tree structure to be used. One of:
Expand Down Expand Up @@ -680,19 +680,19 @@ EvoTreeMLE performs maximum likelihood estimation. Assumed distribution is speci
`loss=:gaussian`: Loss to be be minimized during training. One of:
- `:gaussian` / `:gaussian_mle`
- `:logistic` / `:logistic_mle`
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=100`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `max_depth=6`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.
- `min_weight=8.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `nbins=64`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `monotone_constraints=Dict{Int, Int}()`: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).
!Experimental feature: note that for MLE regression, constraints may not be enforced systematically.
- `tree_type="binary"` Tree structure to be used. One of:
Expand Down

0 comments on commit 712a9e8

Please sign in to comment.