Skip to content

Commit

Permalink
DOI (#266)
Browse files Browse the repository at this point in the history
* DOI

* DOI

* DOI

* DOI

* ci update
  • Loading branch information
jeremiedb authored Jan 25, 2024
1 parent 8c452b6 commit 3b1641d
Show file tree
Hide file tree
Showing 7 changed files with 27 additions and 38 deletions.
11 changes: 1 addition & 10 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,16 +36,7 @@ jobs:
with:
version: ${{ matrix.version }}
arch: ${{ matrix.arch }}
- uses: actions/cache@v2
env:
cache-name: cache-artifacts
with:
path: ~/.julia/artifacts
key: ${{ runner.os }}-test-${{ env.cache-name }}-${{ hashFiles('**/Project.toml') }}
restore-keys: |
${{ runner.os }}-test-${{ env.cache-name }}-
${{ runner.os }}-test-
${{ runner.os }}-
- uses: julia-actions/cache@v1
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-runtest@v1
- uses: julia-actions/julia-processcoverage@v1
Expand Down
13 changes: 5 additions & 8 deletions .github/workflows/Docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,10 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: julia-actions/setup-julia@latest
with:
version: '1.6'
- name: Install dependencies
run: julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
- name: Build and deploy
- uses: julia-actions/setup-julia@v1
- uses: julia-actions/cache@v1
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-docdeploy@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # If authenticating with GitHub Actions token
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} # If authenticating with SSH deploy key
run: julia --project=docs/ docs/make.jl
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} # If authenticating with SSH deploy key
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ docs/site/
Manifest.toml

data/
.vscode/
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "EvoTrees"
uuid = "f6006082-12f8-11e9-0c9c-0d5d367ab1e5"
authors = ["jeremiedb <[email protected]>"]
version = "0.16.5"
version = "0.16.6"

[deps]
BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
# EvoTrees <a href="https://evovest.github.io/EvoTrees.jl/dev/"><img src="figures/hex-evotrees-2.png" align="right" height="160"/></a>


| Documentation | CI Status |
|:------------------------:|:----------------:|
| [![][docs-stable-img]][docs-stable-url] [![][docs-latest-img]][docs-latest-url] | [![][ci-img]][ci-url] |
| Documentation | CI Status | DOI |
|:------------------------:|:----------------:|:----------------:|
| [![][docs-stable-img]][docs-stable-url] [![][docs-latest-img]][docs-latest-url] | [![][ci-img]][ci-url] | |

[docs-latest-img]: https://img.shields.io/badge/docs-latest-blue.svg
[docs-latest-url]: https://evovest.github.io/EvoTrees.jl/dev
Expand Down
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# [EvoTrees.jl](https://github.com/Evovest/EvoTrees.jl)
# EvoTrees.jl

A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram based algorithms with support for multiple loss functions, including various regressions, multi-classification and Gaussian max likelihood.

Expand Down
30 changes: 15 additions & 15 deletions src/MLJ.jl
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ A model type for constructing a EvoTreeRegressor, based on [EvoTrees.jl](https:/
- `:tweedie`
- `:quantile`
- `:l1`
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=100`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
Expand All @@ -173,13 +173,13 @@ A model type for constructing a EvoTreeRegressor, based on [EvoTrees.jl](https:/
- `:l1`: weighting parameters to positive vs negative residuals.
- Positive residual weights = `alpha`
- Negative residual weights = `(1 - alpha)`
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `max_depth=6`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to `2^max_depth`. Typical optimal values are in the 3 to 9 range.
- `min_weight=1.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `nbins=64`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `monotone_constraints=Dict{Int, Int}()`: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).
Only `:linear`, `:logistic`, `:gamma` and `tweedie` losses are supported at the moment.
- `tree_type="binary"` Tree structure to be used. One of:
Expand Down Expand Up @@ -287,19 +287,19 @@ EvoTreeClassifier is used to perform multi-class classification, using cross-ent
# Hyper-parameters
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=100`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain improvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `max_depth=6`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to `2^max_depth`. Typical optimal values are in the 3 to 9 range.
- `min_weight=1.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `nbins=64`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `tree_type="binary"` Tree structure to be used. One of:
- `binary`: Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see `gamma`) stops further node splits.
- `oblivious`: A common splitting condition is imposed to all nodes of a given depth.
Expand Down Expand Up @@ -412,19 +412,19 @@ EvoTreeCount is used to perform Poisson probabilistic regression on count target
# Hyper-parameters
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=100`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `max_depth=6`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.
- `min_weight=1.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `nbins=64`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `monotone_constraints=Dict{Int, Int}()`: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).
- `tree_type="binary"` Tree structure to be used. One of:
- `binary`: Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see `gamma`) stops further node splits.
Expand Down Expand Up @@ -542,19 +542,19 @@ EvoTreeGaussian is used to perform Gaussian probabilistic regression, fitting μ
# Hyper-parameters
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=100`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `max_depth=6`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.
- `min_weight=8.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `nbins=64`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `monotone_constraints=Dict{Int, Int}()`: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).
!Experimental feature: note that for Gaussian regression, constraints may not be enforce systematically.
- `tree_type="binary"` Tree structure to be used. One of:
Expand Down Expand Up @@ -680,19 +680,19 @@ EvoTreeMLE performs maximum likelihood estimation. Assumed distribution is speci
`loss=:gaussian`: Loss to be be minimized during training. One of:
- `:gaussian` / `:gaussian_mle`
- `:logistic` / `:logistic_mle`
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=100`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `max_depth=6`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.
- `min_weight=8.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `nbins=64`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `monotone_constraints=Dict{Int, Int}()`: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).
!Experimental feature: note that for MLE regression, constraints may not be enforced systematically.
- `tree_type="binary"` Tree structure to be used. One of:
Expand Down

2 comments on commit 3b1641d

@jeremiedb
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register()

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/99554

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.16.6 -m "<description of version>" 3b1641d7071db7aac89542d65a5feffa5fa1a765
git push origin v0.16.6

Please sign in to comment.